-- Logs begin at Thu 2019-02-21 21:38:07 PST, end at Wed 2019-03-20 10:05:44 PDT. -- Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys cpuset Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys cpu Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys cpuacct Feb 21 21:38:07 localhost.localdomain kernel: Linux version 3.10.0-957.5.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Fri Feb 1 14:54:57 UTC 2019 Feb 21 21:38:07 localhost.localdomain kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.5.1.el7.x86_64 root=UUID=daa3c77f-cf71-4825-a96b-3aac3f1a346b ro transparent_hugepage=madvise crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8 Feb 21 21:38:07 localhost.localdomain kernel: e820: BIOS-provided physical RAM map: Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x000000000009d000-0x000000000009ffff] reserved Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x0000000000100000-0x000000007a288fff] usable Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x000000007a289000-0x000000007af0afff] reserved Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x000000007af0b000-0x000000007b93afff] ACPI NVS Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x000000007b93b000-0x000000007bab5fff] ACPI data Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x000000007bab6000-0x000000007bae8fff] usable Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x000000007bae9000-0x000000007bafefff] ACPI data Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x000000007baff000-0x000000007bafffff] usable Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x000000007bb00000-0x000000008fffffff] reserved Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x00000000feda8000-0x00000000fedabfff] reserved Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x00000000ff310000-0x00000000ffffffff] reserved Feb 21 21:38:07 localhost.localdomain kernel: BIOS-e820: [mem 0x0000000100000000-0x000000207fffffff] usable Feb 21 21:38:07 localhost.localdomain kernel: NX (Execute Disable) protection: active Feb 21 21:38:07 localhost.localdomain kernel: SMBIOS 2.8 present. Feb 21 21:38:07 localhost.localdomain kernel: DMI: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Feb 21 21:38:07 localhost.localdomain kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved Feb 21 21:38:07 localhost.localdomain kernel: e820: remove [mem 0x000a0000-0x000fffff] usable Feb 21 21:38:07 localhost.localdomain kernel: e820: last_pfn = 0x2080000 max_arch_pfn = 0x400000000 Feb 21 21:38:07 localhost.localdomain kernel: MTRR default type: uncachable Feb 21 21:38:07 localhost.localdomain kernel: MTRR fixed ranges enabled: Feb 21 21:38:07 localhost.localdomain kernel: 00000-9FFFF write-back Feb 21 21:38:07 localhost.localdomain kernel: A0000-BFFFF uncachable Feb 21 21:38:07 localhost.localdomain kernel: C0000-FFFFF write-protect Feb 21 21:38:07 localhost.localdomain kernel: MTRR variable ranges enabled: Feb 21 21:38:07 localhost.localdomain kernel: 0 base 000000000000 mask 3FFF80000000 write-back Feb 21 21:38:07 localhost.localdomain kernel: 1 base 000100000000 mask 3FFF00000000 write-back Feb 21 21:38:07 localhost.localdomain kernel: 2 base 000200000000 mask 3FFE00000000 write-back Feb 21 21:38:07 localhost.localdomain kernel: 3 base 000400000000 mask 3FFC00000000 write-back Feb 21 21:38:07 localhost.localdomain kernel: 4 base 000800000000 mask 3FF800000000 write-back Feb 21 21:38:07 localhost.localdomain kernel: 5 base 001000000000 mask 3FF000000000 write-back Feb 21 21:38:07 localhost.localdomain kernel: 6 base 002000000000 mask 3FFF80000000 write-back Feb 21 21:38:07 localhost.localdomain kernel: 7 base 0000FF000000 mask 3FFFFF000000 write-protect Feb 21 21:38:07 localhost.localdomain kernel: 8 disabled Feb 21 21:38:07 localhost.localdomain kernel: 9 disabled Feb 21 21:38:07 localhost.localdomain kernel: PAT configuration [0-7]: WB WC UC- UC WB WP UC- UC Feb 21 21:38:07 localhost.localdomain kernel: e820: last_pfn = 0x7bb00 max_arch_pfn = 0x400000000 Feb 21 21:38:07 localhost.localdomain kernel: Base memory trampoline at [ffff9be5c0097000] 97000 size 24576 Feb 21 21:38:07 localhost.localdomain kernel: Using GB pages for direct mapping Feb 21 21:38:07 localhost.localdomain kernel: BRK [0x3d6052000, 0x3d6052fff] PGTABLE Feb 21 21:38:07 localhost.localdomain kernel: BRK [0x3d6053000, 0x3d6053fff] PGTABLE Feb 21 21:38:07 localhost.localdomain kernel: BRK [0x3d6054000, 0x3d6054fff] PGTABLE Feb 21 21:38:07 localhost.localdomain kernel: BRK [0x3d6055000, 0x3d6055fff] PGTABLE Feb 21 21:38:07 localhost.localdomain kernel: BRK [0x3d6056000, 0x3d6056fff] PGTABLE Feb 21 21:38:07 localhost.localdomain kernel: BRK [0x3d6057000, 0x3d6057fff] PGTABLE Feb 21 21:38:07 localhost.localdomain kernel: BRK [0x3d6058000, 0x3d6058fff] PGTABLE Feb 21 21:38:07 localhost.localdomain kernel: BRK [0x3d6059000, 0x3d6059fff] PGTABLE Feb 21 21:38:07 localhost.localdomain kernel: RAMDISK: [mem 0x359ce000-0x36cdefff] Feb 21 21:38:07 localhost.localdomain kernel: Early table checksum verification disabled Feb 21 21:38:07 localhost.localdomain kernel: ACPI: RSDP 00000000000fe320 00024 (v02 DELL ) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: XSDT 000000007bab40e8 000B4 (v01 DELL PE_SC3 00000000 01000013) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: FACP 000000007bab0000 000F4 (v04 DELL PE_SC3 00000000 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: DSDT 000000007ba9b000 0DC8B (v02 DELL PE_SC3 00000003 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: FACS 000000007b90b000 00040 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: MCEJ 000000007bab3000 00130 (v01 INTEL 00000001 INTL 0100000D) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: WD__ 000000007bab2000 00134 (v01 DELL PE_SC3 00000001 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: SLIC 000000007bab1000 00024 (v01 DELL PE_SC3 00000001 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: HPET 000000007baaf000 00038 (v01 DELL PE_SC3 00000001 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: APIC 000000007baae000 00AFC (v02 DELL PE_SC3 00000000 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: MCFG 000000007baad000 0003C (v01 DELL PE_SC3 00000001 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: MSCT 000000007baac000 00090 (v01 DELL PE_SC3 00000001 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: SLIT 000000007baab000 0006C (v01 DELL PE_SC3 00000001 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: SRAT 000000007baa9000 01130 (v03 DELL PE_SC3 00000001 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: SSDT 000000007b958000 1424A9 (v02 DELL PE_SC3 00004000 INTL 20121114) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: SSDT 000000007b955000 0217F (v02 DELL PE_SC3 00000002 INTL 20121114) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: SSDT 000000007b954000 0006E (v02 DELL PE_SC3 00000002 INTL 20121114) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PRAD 000000007b953000 00132 (v02 DELL PE_SC3 00000002 INTL 20121114) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: HEST 000000007bafe000 0017C (v01 DELL PE_SC3 00000002 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: BERT 000000007bafd000 00030 (v01 DELL PE_SC3 00000002 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: ERST 000000007bafc000 00230 (v01 DELL PE_SC3 00000002 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: EINJ 000000007bafb000 00150 (v01 DELL PE_SC3 00000002 DELL 00000001) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Local APIC address 0xfee00000 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x20 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x02 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x22 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x04 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x24 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x06 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x26 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x08 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x28 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x10 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x30 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x12 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x32 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x14 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x34 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x16 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x36 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 0 -> APIC 0x18 -> Node 0 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: PXM 1 -> APIC 0x38 -> Node 1 Feb 21 21:38:07 localhost.localdomain kernel: SRAT: Node 0 PXM 0 [mem 0x00000000-0x107fffffff] Feb 21 21:38:07 localhost.localdomain kernel: SRAT: Node 1 PXM 1 [mem 0x1080000000-0x207fffffff] Feb 21 21:38:07 localhost.localdomain kernel: NUMA: Initialized distance table, cnt=2 Feb 21 21:38:07 localhost.localdomain kernel: NODE_DATA(0) allocated [mem 0x107ffd9000-0x107fffffff] Feb 21 21:38:07 localhost.localdomain kernel: NODE_DATA(1) allocated [mem 0x207ffd8000-0x207fffefff] Feb 21 21:38:07 localhost.localdomain kernel: Reserving 168MB of memory at 688MB for crashkernel (System RAM: 130978MB) Feb 21 21:38:07 localhost.localdomain kernel: Zone ranges: Feb 21 21:38:07 localhost.localdomain kernel: DMA [mem 0x00001000-0x00ffffff] Feb 21 21:38:07 localhost.localdomain kernel: DMA32 [mem 0x01000000-0xffffffff] Feb 21 21:38:07 localhost.localdomain kernel: Normal [mem 0x100000000-0x207fffffff] Feb 21 21:38:07 localhost.localdomain kernel: Movable zone start for each node Feb 21 21:38:07 localhost.localdomain kernel: Early memory node ranges Feb 21 21:38:07 localhost.localdomain kernel: node 0: [mem 0x00001000-0x0009cfff] Feb 21 21:38:07 localhost.localdomain kernel: node 0: [mem 0x00100000-0x7a288fff] Feb 21 21:38:07 localhost.localdomain kernel: node 0: [mem 0x7bab6000-0x7bae8fff] Feb 21 21:38:07 localhost.localdomain kernel: node 0: [mem 0x7baff000-0x7bafffff] Feb 21 21:38:07 localhost.localdomain kernel: node 0: [mem 0x100000000-0x107fffffff] Feb 21 21:38:07 localhost.localdomain kernel: node 1: [mem 0x1080000000-0x207fffffff] Feb 21 21:38:07 localhost.localdomain kernel: Initmem setup node 0 [mem 0x00001000-0x107fffffff] Feb 21 21:38:07 localhost.localdomain kernel: On node 0 totalpages: 16753241 Feb 21 21:38:07 localhost.localdomain kernel: DMA zone: 64 pages used for memmap Feb 21 21:38:07 localhost.localdomain kernel: DMA zone: 21 pages reserved Feb 21 21:38:07 localhost.localdomain kernel: DMA zone: 3996 pages, LIFO batch:0 Feb 21 21:38:07 localhost.localdomain kernel: DMA32 zone: 7755 pages used for memmap Feb 21 21:38:07 localhost.localdomain kernel: DMA32 zone: 496317 pages, LIFO batch:31 Feb 21 21:38:07 localhost.localdomain kernel: Normal zone: 253952 pages used for memmap Feb 21 21:38:07 localhost.localdomain kernel: Normal zone: 16252928 pages, LIFO batch:31 Feb 21 21:38:07 localhost.localdomain kernel: Initmem setup node 1 [mem 0x1080000000-0x207fffffff] Feb 21 21:38:07 localhost.localdomain kernel: On node 1 totalpages: 16777216 Feb 21 21:38:07 localhost.localdomain kernel: Normal zone: 262144 pages used for memmap Feb 21 21:38:07 localhost.localdomain kernel: Normal zone: 16777216 pages, LIFO batch:31 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PM-Timer IO Port: 0x408 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Local APIC address 0xfee00000 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x30] lapic_id[0x20] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x31] lapic_id[0x22] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x32] lapic_id[0x24] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x33] lapic_id[0x26] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x08] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x34] lapic_id[0x28] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x05] lapic_id[0x10] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x35] lapic_id[0x30] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x12] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x36] lapic_id[0x32] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x07] lapic_id[0x14] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x37] lapic_id[0x34] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x08] lapic_id[0x16] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x38] lapic_id[0x36] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x09] lapic_id[0x18] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0x39] lapic_id[0x38] enabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC (acpi_id[0xff] lapic_id[0xff] disabled) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x00] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x01] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x02] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x03] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x04] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x05] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x06] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x07] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x08] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x09] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x0a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x0b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x0c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x0d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x0e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x0f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x10] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x11] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x12] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x13] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x14] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x15] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x16] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x17] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x18] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x19] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x1a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x1b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x1c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x1d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x1e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x1f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x20] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x21] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x22] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x23] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x24] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x25] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x26] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x27] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x28] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x29] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x2a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x2b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x2c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x2d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x2e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x2f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x30] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x31] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x32] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x33] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x34] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x35] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x36] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x37] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x38] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x39] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x3a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x3b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x3c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x3d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x3e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x3f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x40] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x41] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x42] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x43] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x44] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x45] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x46] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x47] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x48] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x49] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x4a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x4b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x4c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x4d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x4e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x4f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x50] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x51] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x52] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x53] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x54] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x55] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x56] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x57] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x58] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x59] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x5a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x5b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x5c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x5d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x5e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x5f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x60] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x61] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x62] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x63] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x64] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x65] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x66] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x67] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x68] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x69] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x6a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x6b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x6c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x6d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x6e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x6f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x70] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x71] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x72] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x73] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x74] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x75] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x76] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x77] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x78] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x79] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x7a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x7b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x7c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x7d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x7e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x7f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x80] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x81] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x82] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x83] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x84] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x85] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x86] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x87] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x88] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x89] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x8a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x8b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x8c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x8d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x8e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x8f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x90] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x91] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x92] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x93] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x94] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x95] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x96] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x97] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x98] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x99] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x9a] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x9b] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x9c] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x9d] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x9e] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0x9f] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa0] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa1] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa2] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa3] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa4] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa5] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa6] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa7] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa8] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xa9] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xaa] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xab] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xac] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xad] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xae] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xaf] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb0] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb1] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb2] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb3] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb4] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb5] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb6] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb7] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb8] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xb9] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xba] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xbb] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xbc] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xbd] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xbe] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: LAPIC_NMI (acpi_id[0xbf] high level lint[0x1]) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) Feb 21 21:38:07 localhost.localdomain kernel: IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: IOAPIC (id[0x09] address[0xfec01000] gsi_base[24]) Feb 21 21:38:07 localhost.localdomain kernel: IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 24-47 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: IOAPIC (id[0x0a] address[0xfec40000] gsi_base[48]) Feb 21 21:38:07 localhost.localdomain kernel: IOAPIC[2]: apic_id 10, version 32, address 0xfec40000, GSI 48-71 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: IRQ0 used by override. Feb 21 21:38:07 localhost.localdomain kernel: ACPI: IRQ9 used by override. Feb 21 21:38:07 localhost.localdomain kernel: Using ACPI (MADT) for SMP configuration information Feb 21 21:38:07 localhost.localdomain kernel: ACPI: HPET id: 0x8086a701 base: 0xfed00000 Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Allowing 192 CPUs, 172 hotplug CPUs Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0x0009d000-0x0009ffff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0x000a0000-0x000dffff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0x000e0000-0x000fffff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0x7a289000-0x7af0afff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0x7af0b000-0x7b93afff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0x7b93b000-0x7bab5fff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0x7bae9000-0x7bafefff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0x7bb00000-0x8fffffff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0x90000000-0xfeda7fff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0xfeda8000-0xfedabfff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0xfedac000-0xff30ffff] Feb 21 21:38:07 localhost.localdomain kernel: PM: Registered nosave memory: [mem 0xff310000-0xffffffff] Feb 21 21:38:07 localhost.localdomain kernel: e820: [mem 0x90000000-0xfeda7fff] available for PCI devices Feb 21 21:38:07 localhost.localdomain kernel: Booting paravirtualized kernel on bare hardware Feb 21 21:38:07 localhost.localdomain kernel: setup_percpu: NR_CPUS:5120 nr_cpumask_bits:192 nr_cpu_ids:192 nr_node_ids:2 Feb 21 21:38:07 localhost.localdomain kernel: PERCPU: Embedded 38 pages/cpu @ffff9bf5fe600000 s118784 r8192 d28672 u262144 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: s118784 r8192 d28672 u262144 alloc=1*2097152 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 000 002 004 006 008 010 012 014 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 016 018 020 022 024 026 028 030 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 032 034 036 038 040 042 044 046 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 048 050 052 054 056 058 060 062 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 064 066 068 070 072 074 076 078 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 080 082 084 086 088 090 092 094 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 096 098 100 102 104 106 108 110 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 112 114 116 118 120 122 124 126 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 128 130 132 134 136 138 140 142 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 144 146 148 150 152 154 156 158 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 160 162 164 166 168 170 172 174 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [0] 176 178 180 182 184 186 188 190 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 001 003 005 007 009 011 013 015 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 017 019 021 023 025 027 029 031 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 033 035 037 039 041 043 045 047 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 049 051 053 055 057 059 061 063 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 065 067 069 071 073 075 077 079 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 081 083 085 087 089 091 093 095 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 097 099 101 103 105 107 109 111 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 113 115 117 119 121 123 125 127 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 129 131 133 135 137 139 141 143 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 145 147 149 151 153 155 157 159 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 161 163 165 167 169 171 173 175 Feb 21 21:38:07 localhost.localdomain kernel: pcpu-alloc: [1] 177 179 181 183 185 187 189 191 Feb 21 21:38:07 localhost.localdomain kernel: Built 2 zonelists in Zone order, mobility grouping on. Total pages: 33006521 Feb 21 21:38:07 localhost.localdomain kernel: Policy zone: Normal Feb 21 21:38:07 localhost.localdomain kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.5.1.el7.x86_64 root=UUID=daa3c77f-cf71-4825-a96b-3aac3f1a346b ro transparent_hugepage=madvise crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8 Feb 21 21:38:07 localhost.localdomain kernel: PID hash table entries: 4096 (order: 3, 32768 bytes) Feb 21 21:38:07 localhost.localdomain kernel: x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 Feb 21 21:38:07 localhost.localdomain kernel: xsave: enabled xstate_bv 0x7, cntxt size 0x340 using standard form Feb 21 21:38:07 localhost.localdomain kernel: Memory: 5938348k/136314880k available (7664k kernel code, 2193052k absent, 2403576k reserved, 6055k data, 1876k init) Feb 21 21:38:07 localhost.localdomain kernel: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=192, Nodes=2 Feb 21 21:38:07 localhost.localdomain kernel: x86/pti: Unmapping kernel while in userspace Feb 21 21:38:07 localhost.localdomain kernel: Hierarchical RCU implementation. Feb 21 21:38:07 localhost.localdomain kernel: RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=192. Feb 21 21:38:07 localhost.localdomain kernel: NR_IRQS:327936 nr_irqs:2776 0 Feb 21 21:38:07 localhost.localdomain kernel: Console: colour VGA+ 80x25 Feb 21 21:38:07 localhost.localdomain kernel: console [ttyS0] enabled Feb 21 21:38:07 localhost.localdomain kernel: allocated 536870912 bytes of page_cgroup Feb 21 21:38:07 localhost.localdomain kernel: please try 'cgroup_disable=memory' option if you don't want memory cgroups Feb 21 21:38:07 localhost.localdomain kernel: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl Feb 21 21:38:07 localhost.localdomain kernel: hpet clockevent registered Feb 21 21:38:07 localhost.localdomain kernel: tsc: Fast TSC calibration using PIT Feb 21 21:38:07 localhost.localdomain kernel: tsc: Detected 2394.361 MHz processor Feb 21 21:38:07 localhost.localdomain kernel: Calibrating delay loop (skipped), value calculated using timer frequency.. 4788.72 BogoMIPS (lpj=2394361) Feb 21 21:38:07 localhost.localdomain kernel: pid_max: default: 196608 minimum: 1536 Feb 21 21:38:07 localhost.localdomain kernel: Security Framework initialized Feb 21 21:38:07 localhost.localdomain kernel: SELinux: Initializing. Feb 21 21:38:07 localhost.localdomain kernel: SELinux: Starting in permissive mode Feb 21 21:38:07 localhost.localdomain kernel: Yama: becoming mindful. Feb 21 21:38:07 localhost.localdomain kernel: Dentry cache hash table entries: 16777216 (order: 15, 134217728 bytes) Feb 21 21:38:07 localhost.localdomain kernel: Inode-cache hash table entries: 8388608 (order: 14, 67108864 bytes) Feb 21 21:38:07 localhost.localdomain kernel: Mount-cache hash table entries: 262144 (order: 9, 2097152 bytes) Feb 21 21:38:07 localhost.localdomain kernel: Mountpoint-cache hash table entries: 262144 (order: 9, 2097152 bytes) Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys memory Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys devices Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys freezer Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys net_cls Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys blkio Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys perf_event Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys hugetlb Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys pids Feb 21 21:38:07 localhost.localdomain kernel: Initializing cgroup subsys net_prio Feb 21 21:38:07 localhost.localdomain kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance' Feb 21 21:38:07 localhost.localdomain kernel: ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8) Feb 21 21:38:07 localhost.localdomain kernel: mce: CPU supports 22 MCE banks Feb 21 21:38:07 localhost.localdomain kernel: CPU0: Thermal monitoring enabled (TM1) Feb 21 21:38:07 localhost.localdomain kernel: Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 Feb 21 21:38:07 localhost.localdomain kernel: Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0 Feb 21 21:38:07 localhost.localdomain kernel: tlb_flushall_shift: 6 Feb 21 21:38:07 localhost.localdomain kernel: Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp Feb 21 21:38:07 localhost.localdomain kernel: FEATURE SPEC_CTRL Present Feb 21 21:38:07 localhost.localdomain kernel: FEATURE IBPB_SUPPORT Present Feb 21 21:38:07 localhost.localdomain kernel: Spectre V2 : Mitigation: Full retpoline Feb 21 21:38:07 localhost.localdomain kernel: Freeing SMP alternatives: 28k freed Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Core revision 20130517 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: All ACPI Tables successfully acquired Feb 21 21:38:07 localhost.localdomain kernel: ftrace: allocating 29189 entries in 115 pages Feb 21 21:38:07 localhost.localdomain kernel: IRQ remapping doesn't support X2APIC mode, disable x2apic. Feb 21 21:38:07 localhost.localdomain kernel: Switched APIC routing to physical flat. Feb 21 21:38:07 localhost.localdomain kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 Feb 21 21:38:07 localhost.localdomain kernel: smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz (fam: 06, model: 4f, stepping: 01) Feb 21 21:38:07 localhost.localdomain kernel: TSC deadline timer enabled Feb 21 21:38:07 localhost.localdomain kernel: Performance Events: PEBS fmt2+, Broadwell events, 16-deep LBR, full-width counters, Intel PMU driver. Feb 21 21:38:07 localhost.localdomain kernel: ... version: 3 Feb 21 21:38:07 localhost.localdomain kernel: ... bit width: 48 Feb 21 21:38:07 localhost.localdomain kernel: ... generic registers: 8 Feb 21 21:38:07 localhost.localdomain kernel: ... value mask: 0000ffffffffffff Feb 21 21:38:07 localhost.localdomain kernel: ... max period: 00007fffffffffff Feb 21 21:38:07 localhost.localdomain kernel: ... fixed-purpose events: 3 Feb 21 21:38:07 localhost.localdomain kernel: ... event mask: 00000007000000ff Feb 21 21:38:07 localhost.localdomain kernel: NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #1 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 0, Processors #2 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #3 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 0, Processors #4 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #5 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 0, Processors #6 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #7 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 0, Processors #8 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #9 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 0, Processors #10 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #11 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 0, Processors #12 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #13 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 0, Processors #14 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #15 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 0, Processors #16 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #17 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 0, Processors #18 OK Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Booting Node 1, Processors #19 Feb 21 21:38:07 localhost.localdomain kernel: Brought up 20 CPUs Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Max logical packages: 20 Feb 21 21:38:07 localhost.localdomain kernel: smpboot: Total of 20 processors activated (95834.68 BogoMIPS) Feb 21 21:38:07 localhost.localdomain kernel: node 0 initialised, 15458022 pages in 257ms Feb 21 21:38:07 localhost.localdomain kernel: node 1 initialised, 15986952 pages in 262ms Feb 21 21:38:07 localhost.localdomain kernel: devtmpfs: initialized Feb 21 21:38:07 localhost.localdomain kernel: EVM: security.selinux Feb 21 21:38:07 localhost.localdomain kernel: EVM: security.ima Feb 21 21:38:07 localhost.localdomain kernel: EVM: security.capability Feb 21 21:38:07 localhost.localdomain kernel: PM: Registering ACPI NVS region [mem 0x7af0b000-0x7b93afff] (10682368 bytes) Feb 21 21:38:07 localhost.localdomain kernel: atomic64 test passed for x86-64 platform with CX8 and with SSE Feb 21 21:38:07 localhost.localdomain kernel: pinctrl core: initialized pinctrl subsystem Feb 21 21:38:07 localhost.localdomain kernel: RTC time: 5:38:02, date: 02/22/19 Feb 21 21:38:07 localhost.localdomain kernel: NET: Registered protocol family 16 Feb 21 21:38:07 localhost.localdomain kernel: ACPI FADT declares the system doesn't support PCIe ASPM, so disable it Feb 21 21:38:07 localhost.localdomain kernel: ACPI: bus type PCI registered Feb 21 21:38:07 localhost.localdomain kernel: acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 Feb 21 21:38:07 localhost.localdomain kernel: PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) Feb 21 21:38:07 localhost.localdomain kernel: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820 Feb 21 21:38:07 localhost.localdomain kernel: PCI: Using configuration type 1 for base access Feb 21 21:38:07 localhost.localdomain kernel: PCI: Dell System detected, enabling pci=bfsort. Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Added _OSI(Module Device) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Added _OSI(Processor Device) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Added _OSI(3.0 _SCP Extensions) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Added _OSI(Processor Aggregator Device) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Added _OSI(Linux-Dell-Video) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: EC: Look up EC in DSDT Feb 21 21:38:07 localhost.localdomain kernel: ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored Feb 21 21:38:07 localhost.localdomain kernel: random: fast init done Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Interpreter enabled Feb 21 21:38:07 localhost.localdomain kernel: ACPI: (supports S0 S5) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Using IOAPIC for interrupt routing Feb 21 21:38:07 localhost.localdomain kernel: HEST: Table parsing has been initialized. Feb 21 21:38:07 localhost.localdomain kernel: PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug Feb 21 21:38:07 localhost.localdomain kernel: ACPI: GPE 0x16 active on init Feb 21 21:38:07 localhost.localdomain kernel: ACPI: GPE 0x24 active on init Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Enabled 2 GPEs in block 00 to 3F Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Root Bridge [UNC1] (domain 0000 [bus ff]) Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A03:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A03:02: PCIe AER handled by firmware Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A03:02: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability] Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A03:02: FADT indicates ASPM is unsupported, using BIOS configuration Feb 21 21:38:07 localhost.localdomain kernel: PCI host bridge to bus 0000:ff Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:ff: root bus resource [bus ff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:08.0: [8086:6f80] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:08.2: [8086:6f32] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:08.3: [8086:6f83] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:09.0: [8086:6f90] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:09.2: [8086:6f33] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:09.3: [8086:6f93] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0b.0: [8086:6f81] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0b.1: [8086:6f36] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0b.2: [8086:6f37] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0b.3: [8086:6f76] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0c.0: [8086:6fe0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0c.1: [8086:6fe1] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0c.2: [8086:6fe2] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0c.3: [8086:6fe3] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0c.4: [8086:6fe4] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0c.5: [8086:6fe5] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0c.6: [8086:6fe6] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0c.7: [8086:6fe7] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0d.0: [8086:6fe8] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0d.1: [8086:6fe9] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0f.0: [8086:6ff8] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0f.1: [8086:6ff9] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0f.4: [8086:6ffc] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0f.5: [8086:6ffd] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:0f.6: [8086:6ffe] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:10.0: [8086:6f1d] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:10.1: [8086:6f34] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:10.5: [8086:6f1e] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:10.6: [8086:6f7d] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:10.7: [8086:6f1f] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:12.0: [8086:6fa0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:12.1: [8086:6f30] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:12.2: [8086:6f70] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:13.0: [8086:6fa8] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:13.1: [8086:6f71] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:13.2: [8086:6faa] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:13.3: [8086:6fab] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:13.4: [8086:6fac] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:13.5: [8086:6fad] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:13.6: [8086:6fae] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:13.7: [8086:6faf] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:14.0: [8086:6fb0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:14.1: [8086:6fb1] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:14.2: [8086:6fb2] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:14.3: [8086:6fb3] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:14.4: [8086:6fbc] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:14.5: [8086:6fbd] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:14.6: [8086:6fbe] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:14.7: [8086:6fbf] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:15.0: [8086:6fb4] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:15.1: [8086:6fb5] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:15.2: [8086:6fb6] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:15.3: [8086:6fb7] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:16.0: [8086:6f68] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:16.6: [8086:6f6e] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:16.7: [8086:6f6f] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:17.0: [8086:6fd0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:17.4: [8086:6fb8] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:17.5: [8086:6fb9] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:17.6: [8086:6fba] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:17.7: [8086:6fbb] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:1e.0: [8086:6f98] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:1e.1: [8086:6f99] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:1e.2: [8086:6f9a] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:1e.3: [8086:6fc0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:1e.4: [8086:6f9c] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:1f.0: [8086:6f88] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:ff:1f.2: [8086:6f8a] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Root Bridge [UNC0] (domain 0000 [bus 7f]) Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A03:03: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A03:03: PCIe AER handled by firmware Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A03:03: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability] Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A03:03: FADT indicates ASPM is unsupported, using BIOS configuration Feb 21 21:38:07 localhost.localdomain kernel: PCI host bridge to bus 0000:7f Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:7f: root bus resource [bus 7f] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:08.0: [8086:6f80] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:08.2: [8086:6f32] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:08.3: [8086:6f83] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:09.0: [8086:6f90] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:09.2: [8086:6f33] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:09.3: [8086:6f93] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0b.0: [8086:6f81] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0b.1: [8086:6f36] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0b.2: [8086:6f37] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0b.3: [8086:6f76] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0c.0: [8086:6fe0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0c.1: [8086:6fe1] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0c.2: [8086:6fe2] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0c.3: [8086:6fe3] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0c.4: [8086:6fe4] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0c.5: [8086:6fe5] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0c.6: [8086:6fe6] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0c.7: [8086:6fe7] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0d.0: [8086:6fe8] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0d.1: [8086:6fe9] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0f.0: [8086:6ff8] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0f.1: [8086:6ff9] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0f.4: [8086:6ffc] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0f.5: [8086:6ffd] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:0f.6: [8086:6ffe] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:10.0: [8086:6f1d] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:10.1: [8086:6f34] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:10.5: [8086:6f1e] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:10.6: [8086:6f7d] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:10.7: [8086:6f1f] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:12.0: [8086:6fa0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:12.1: [8086:6f30] type 00 class 0x110100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:12.2: [8086:6f70] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:13.0: [8086:6fa8] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:13.1: [8086:6f71] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:13.2: [8086:6faa] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:13.3: [8086:6fab] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:13.4: [8086:6fac] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:13.5: [8086:6fad] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:13.6: [8086:6fae] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:13.7: [8086:6faf] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:14.0: [8086:6fb0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:14.1: [8086:6fb1] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:14.2: [8086:6fb2] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:14.3: [8086:6fb3] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:14.4: [8086:6fbc] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:14.5: [8086:6fbd] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:14.6: [8086:6fbe] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:14.7: [8086:6fbf] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:15.0: [8086:6fb4] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:15.1: [8086:6fb5] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:15.2: [8086:6fb6] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:15.3: [8086:6fb7] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:16.0: [8086:6f68] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:16.6: [8086:6f6e] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:16.7: [8086:6f6f] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:17.0: [8086:6fd0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:17.4: [8086:6fb8] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:17.5: [8086:6fb9] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:17.6: [8086:6fba] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:17.7: [8086:6fbb] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:1e.0: [8086:6f98] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:1e.1: [8086:6f99] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:1e.2: [8086:6f9a] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:1e.3: [8086:6fc0] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:1e.4: [8086:6f9c] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:1f.0: [8086:6f88] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:7f:1f.2: [8086:6f8a] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7e]) Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A08:00: PCIe AER handled by firmware Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug] Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration Feb 21 21:38:07 localhost.localdomain kernel: PCI host bridge to bus 0000:00 Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: root bus resource [io 0x0000-0x03bb window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: root bus resource [io 0x03bc-0x03df window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: root bus resource [io 0x03e0-0x0cf7 window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: root bus resource [io 0x1000-0x7fff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: root bus resource [mem 0x90000000-0xc7ffbfff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: root bus resource [mem 0x38000000000-0x3bfffffffff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: root bus resource [bus 00-7e] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:00.0: [8086:6f00] type 00 class 0x060000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:01.0: [8086:6f02] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:01.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:01.0: System wakeup disabled by ACPI Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:02.0: [8086:6f04] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:02.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:02.0: System wakeup disabled by ACPI Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:03.0: [8086:6f08] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:03.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:03.0: System wakeup disabled by ACPI Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:05.0: [8086:6f28] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:05.1: [8086:6f29] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:05.2: [8086:6f2a] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:05.4: [8086:6f2c] type 00 class 0x080020 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:05.4: reg 0x10: [mem 0x93a13000-0x93a13fff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:11.0: [8086:8d7c] type 00 class 0xff0000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:14.0: [8086:8d31] type 00 class 0x0c0330 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:14.0: reg 0x10: [mem 0x93a00000-0x93a0ffff 64bit] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:14.0: PME# supported from D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:14.0: System wakeup disabled by ACPI Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:16.0: [8086:8d3a] type 00 class 0x078000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:16.0: reg 0x10: [mem 0x3bffff03000-0x3bffff0300f 64bit] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:16.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:16.1: [8086:8d3b] type 00 class 0x078000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:16.1: reg 0x10: [mem 0x3bffff02000-0x3bffff0200f 64bit] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:16.1: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1a.0: [8086:8d2d] type 00 class 0x0c0320 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1a.0: reg 0x10: [mem 0x93a12000-0x93a123ff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1a.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.0: [8086:8d10] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.7: [8086:8d1e] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.7: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.7: System wakeup disabled by ACPI Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1d.0: [8086:8d26] type 00 class 0x0c0320 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1d.0: reg 0x10: [mem 0x93a11000-0x93a113ff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1d.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1f.0: [8086:8d44] type 00 class 0x060100 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1f.2: [8086:8d02] type 00 class 0x010601 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1f.2: reg 0x10: [io 0x2048-0x204f] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1f.2: reg 0x14: [io 0x2054-0x2057] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1f.2: reg 0x18: [io 0x2040-0x2047] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1f.2: reg 0x1c: [io 0x2050-0x2053] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1f.2: reg 0x20: [io 0x2020-0x203f] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1f.2: reg 0x24: [mem 0x93a10000-0x93a107ff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1f.2: PME# supported from D3hot Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:01.0: PCI bridge to [bus 02] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:02.0: PCI bridge to [bus 03] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:01:00.0: [15b3:1013] type 00 class 0x020700 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:01:00.0: reg 0x10: [mem 0x90000000-0x91ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:01:00.0: reg 0x30: [mem 0xfff00000-0xffffffff pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:01:00.0: PME# supported from D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:03.0: PCI bridge to [bus 01] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:03.0: bridge window [mem 0x90000000-0x91ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.0: PCI bridge to [bus 04] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:05:00.0: [1912:001d] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.7: PCI bridge to [bus 05-09] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.7: bridge window [mem 0x93000000-0x939fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.7: bridge window [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:06:00.0: [1912:001d] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:06:00.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:05:00.0: PCI bridge to [bus 06-09] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:05:00.0: bridge window [mem 0x93000000-0x939fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:05:00.0: bridge window [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:07:00.0: [1912:001a] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:06:00.0: PCI bridge to [bus 07-08] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:06:00.0: bridge window [mem 0x93000000-0x938fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:06:00.0: bridge window [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:08:00.0: [102b:0534] type 00 class 0x030000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:08:00.0: reg 0x10: [mem 0x92000000-0x92ffffff pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:08:00.0: reg 0x14: [mem 0x93800000-0x93803fff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:08:00.0: reg 0x18: [mem 0x93000000-0x937fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:07:00.0: PCI bridge to [bus 08] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:07:00.0: bridge window [mem 0x93000000-0x938fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:07:00.0: bridge window [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: on NUMA node 0 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 11 12 14 *15) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 *6 7 9 10 11 12 14 15) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 11 12 *14 15) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled. Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled. Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled. Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled. Feb 21 21:38:07 localhost.localdomain kernel: ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-fe]) Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A08:01: PCIe AER handled by firmware Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A08:01: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability] Feb 21 21:38:07 localhost.localdomain kernel: acpi PNP0A08:01: FADT indicates ASPM is unsupported, using BIOS configuration Feb 21 21:38:07 localhost.localdomain kernel: PCI host bridge to bus 0000:80 Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:80: root bus resource [io 0x8000-0xffff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:80: root bus resource [mem 0xc8000000-0xfbffbfff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:80: root bus resource [mem 0x3c000000000-0x3ffffffffff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:80: root bus resource [bus 80-fe] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: [8086:6f02] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: System wakeup disabled by ACPI Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:02.0: [8086:6f04] type 01 class 0x060400 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:02.0: PME# supported from D0 D3hot D3cold Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:02.0: System wakeup disabled by ACPI Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:05.0: [8086:6f28] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:05.1: [8086:6f29] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:05.2: [8086:6f2a] type 00 class 0x088000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:05.4: [8086:6f2c] type 00 class 0x080020 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:05.4: reg 0x10: [mem 0xca100000-0xca100fff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: [8086:10fb] type 00 class 0x020000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: reg 0x10: [mem 0xc9000000-0xc9ffffff 64bit] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: reg 0x18: [io 0x8020-0x803f] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: reg 0x20: [mem 0xca004000-0xca007fff 64bit] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: reg 0x30: [mem 0xff800000-0xffffffff pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: PME# supported from D0 D3hot Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR0 for 64 VFs) Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: reg 0x190: [mem 0x00000000-0x00003fff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR3 for 64 VFs) Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: [8086:10fb] type 00 class 0x020000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: reg 0x10: [mem 0xc8000000-0xc8ffffff 64bit] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: reg 0x18: [io 0x8000-0x801f] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: reg 0x20: [mem 0xca000000-0xca003fff 64bit] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: reg 0x30: [mem 0xff800000-0xffffffff pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: PME# supported from D0 D3hot Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR0 for 64 VFs) Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: reg 0x190: [mem 0x00000000-0x00003fff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR3 for 64 VFs) Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: PCI bridge to [bus 81] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: bridge window [io 0x8000-0x8fff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: bridge window [mem 0xc8000000-0xca0fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:02.0: PCI bridge to [bus 82] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:80: on NUMA node 1 Feb 21 21:38:07 localhost.localdomain kernel: vgaarb: device added: PCI:0000:08:00.0,decodes=io+mem,owns=io+mem,locks=none Feb 21 21:38:07 localhost.localdomain kernel: vgaarb: loaded Feb 21 21:38:07 localhost.localdomain kernel: vgaarb: bridge control possible 0000:08:00.0 Feb 21 21:38:07 localhost.localdomain kernel: SCSI subsystem initialized Feb 21 21:38:07 localhost.localdomain kernel: ACPI: bus type USB registered Feb 21 21:38:07 localhost.localdomain kernel: usbcore: registered new interface driver usbfs Feb 21 21:38:07 localhost.localdomain kernel: usbcore: registered new interface driver hub Feb 21 21:38:07 localhost.localdomain kernel: usbcore: registered new device driver usb Feb 21 21:38:07 localhost.localdomain kernel: EDAC MC: Ver: 3.0.0 Feb 21 21:38:07 localhost.localdomain kernel: PCI: Using ACPI for IRQ routing Feb 21 21:38:07 localhost.localdomain kernel: PCI: pci_cache_line_size set to 64 bytes Feb 21 21:38:07 localhost.localdomain kernel: e820: reserve RAM buffer [mem 0x0009d000-0x0009ffff] Feb 21 21:38:07 localhost.localdomain kernel: e820: reserve RAM buffer [mem 0x7a289000-0x7bffffff] Feb 21 21:38:07 localhost.localdomain kernel: e820: reserve RAM buffer [mem 0x7bae9000-0x7bffffff] Feb 21 21:38:07 localhost.localdomain kernel: e820: reserve RAM buffer [mem 0x7bb00000-0x7bffffff] Feb 21 21:38:07 localhost.localdomain kernel: NetLabel: Initializing Feb 21 21:38:07 localhost.localdomain kernel: NetLabel: domain hash size = 128 Feb 21 21:38:07 localhost.localdomain kernel: NetLabel: protocols = UNLABELED CIPSOv4 Feb 21 21:38:07 localhost.localdomain kernel: NetLabel: unlabeled traffic allowed by default Feb 21 21:38:07 localhost.localdomain kernel: hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0 Feb 21 21:38:07 localhost.localdomain kernel: hpet0: 8 comparators, 64-bit 14.318180 MHz counter Feb 21 21:38:07 localhost.localdomain kernel: amd_nb: Cannot enumerate AMD northbridges Feb 21 21:38:07 localhost.localdomain kernel: Switched to clocksource hpet Feb 21 21:38:07 localhost.localdomain kernel: pnp: PnP ACPI init Feb 21 21:38:07 localhost.localdomain kernel: ACPI: bus type PNP registered Feb 21 21:38:07 localhost.localdomain kernel: pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active) Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [io 0x0500-0x053f] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [io 0x0400-0x047f] could not be reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [io 0x0540-0x057f] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [io 0x0600-0x061f] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [io 0x0ca0-0x0ca5] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [io 0x0880-0x0883] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [io 0x0800-0x081f] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [mem 0xfeda8000-0xfedcbfff] could not be reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [mem 0xff000000-0xffffffff] could not be reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [mem 0xfee00000-0xfeefffff] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [mem 0xfed12000-0xfed1200f] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [mem 0xfed12010-0xfed1201f] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: [mem 0xfed1b000-0xfed1bfff] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active) Feb 21 21:38:07 localhost.localdomain kernel: pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active) Feb 21 21:38:07 localhost.localdomain kernel: pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active) Feb 21 21:38:07 localhost.localdomain kernel: system 00:04: [io 0x0ca8] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:04: [io 0x0cac] has been reserved Feb 21 21:38:07 localhost.localdomain kernel: system 00:04: Plug and Play ACPI device, IDs IPI0001 PNP0c01 (active) Feb 21 21:38:07 localhost.localdomain kernel: pnp: PnP ACPI: found 5 devices Feb 21 21:38:07 localhost.localdomain kernel: ACPI: bus type PNP unregistered Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: no compatible bridge window Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: can't claim BAR 6 [mem 0xff800000-0xffffffff pref]: no compatible bridge window Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: can't claim BAR 6 [mem 0xff800000-0xffffffff pref]: no compatible bridge window Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:03.0: BAR 14: assigned [mem 0x93b00000-0x93bfffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:01.0: PCI bridge to [bus 02] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:02.0: PCI bridge to [bus 03] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:01:00.0: BAR 6: assigned [mem 0x93b00000-0x93bfffff pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:03.0: PCI bridge to [bus 01] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:03.0: bridge window [mem 0x93b00000-0x93bfffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:03.0: bridge window [mem 0x90000000-0x91ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.0: PCI bridge to [bus 04] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:07:00.0: PCI bridge to [bus 08] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:07:00.0: bridge window [mem 0x93000000-0x938fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:07:00.0: bridge window [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:06:00.0: PCI bridge to [bus 07-08] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:06:00.0: bridge window [mem 0x93000000-0x938fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:06:00.0: bridge window [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:05:00.0: PCI bridge to [bus 06-09] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:05:00.0: bridge window [mem 0x93000000-0x939fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:05:00.0: bridge window [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.7: PCI bridge to [bus 05-09] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.7: bridge window [mem 0x93000000-0x939fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:00:1c.7: bridge window [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: resource 4 [io 0x0000-0x03bb window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: resource 5 [io 0x03bc-0x03df window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: resource 6 [io 0x03e0-0x0cf7 window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: resource 7 [io 0x1000-0x7fff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: resource 8 [mem 0x000a0000-0x000bffff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: resource 9 [mem 0x90000000-0xc7ffbfff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:00: resource 10 [mem 0x38000000000-0x3bfffffffff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:01: resource 1 [mem 0x93b00000-0x93bfffff] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:01: resource 2 [mem 0x90000000-0x91ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:05: resource 1 [mem 0x93000000-0x939fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:05: resource 2 [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:06: resource 1 [mem 0x93000000-0x939fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:06: resource 2 [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:07: resource 1 [mem 0x93000000-0x938fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:07: resource 2 [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:08: resource 1 [mem 0x93000000-0x938fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:08: resource 2 [mem 0x92000000-0x92ffffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 81] add_size 400000 add_align 100000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] res_to_dev_res add_size 400000 min_align 100000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: res[15]=[mem 0x00100000-0x004fffff 64bit pref] res_to_dev_res add_size 400000 min_align 100000 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: BAR 15: assigned [mem 0x3c000000000-0x3c0003fffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: res[7]=[mem 0x00000000-0xffffffffffffffff 64bit pref] res_to_dev_res add_size 100000 min_align 0 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: res[10]=[mem 0x00000000-0xffffffffffffffff 64bit pref] res_to_dev_res add_size 100000 min_align 0 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: res[7]=[mem 0x00000000-0xffffffffffffffff 64bit pref] res_to_dev_res add_size 100000 min_align 0 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: res[10]=[mem 0x00000000-0xffffffffffffffff 64bit pref] res_to_dev_res add_size 100000 min_align 0 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: BAR 6: no space for [mem size 0x00800000 pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: BAR 6: failed to assign [mem size 0x00800000 pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: BAR 6: no space for [mem size 0x00800000 pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: BAR 6: failed to assign [mem size 0x00800000 pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: BAR 7: assigned [mem 0x3c000000000-0x3c0000fffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: BAR 10: assigned [mem 0x3c000100000-0x3c0001fffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: BAR 7: assigned [mem 0x3c000200000-0x3c0002fffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: BAR 10: assigned [mem 0x3c000300000-0x3c0003fffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: PCI bridge to [bus 81] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: bridge window [io 0x8000-0x8fff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: bridge window [mem 0xc8000000-0xca0fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:01.0: bridge window [mem 0x3c000000000-0x3c0003fffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:80:02.0: PCI bridge to [bus 82] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:80: resource 4 [io 0x8000-0xffff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:80: resource 5 [mem 0xc8000000-0xfbffbfff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:80: resource 6 [mem 0x3c000000000-0x3ffffffffff window] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:81: resource 0 [io 0x8000-0x8fff] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:81: resource 1 [mem 0xc8000000-0xca0fffff] Feb 21 21:38:07 localhost.localdomain kernel: pci_bus 0000:81: resource 2 [mem 0x3c000000000-0x3c0003fffff 64bit pref] Feb 21 21:38:07 localhost.localdomain kernel: NET: Registered protocol family 2 Feb 21 21:38:07 localhost.localdomain kernel: TCP established hash table entries: 524288 (order: 10, 4194304 bytes) Feb 21 21:38:07 localhost.localdomain kernel: TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) Feb 21 21:38:07 localhost.localdomain kernel: TCP: Hash tables configured (established 524288 bind 65536) Feb 21 21:38:07 localhost.localdomain kernel: TCP: reno registered Feb 21 21:38:07 localhost.localdomain kernel: UDP hash table entries: 65536 (order: 9, 2097152 bytes) Feb 21 21:38:07 localhost.localdomain kernel: UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes) Feb 21 21:38:07 localhost.localdomain kernel: NET: Registered protocol family 1 Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:08:00.0: Boot video device Feb 21 21:38:07 localhost.localdomain kernel: PCI: CLS 32 bytes, default 64 Feb 21 21:38:07 localhost.localdomain kernel: Unpacking initramfs... Feb 21 21:38:07 localhost.localdomain kernel: Freeing initrd memory: 19524k freed Feb 21 21:38:07 localhost.localdomain kernel: PCI-DMA: Using software bounce buffering for IO (SWIOTLB) Feb 21 21:38:07 localhost.localdomain kernel: software IO TLB [mem 0x76289000-0x7a289000] (64MB) mapped at [ffff9be636289000-ffff9be63a288fff] Feb 21 21:38:07 localhost.localdomain kernel: RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer Feb 21 21:38:07 localhost.localdomain kernel: RAPL PMU: hw unit of domain pp0-core 2^-14 Joules Feb 21 21:38:07 localhost.localdomain kernel: RAPL PMU: hw unit of domain package 2^-14 Joules Feb 21 21:38:07 localhost.localdomain kernel: RAPL PMU: hw unit of domain dram 2^-16 Joules Feb 21 21:38:07 localhost.localdomain kernel: sha1_ssse3: Using AVX2 optimized SHA-1 implementation Feb 21 21:38:07 localhost.localdomain kernel: sha256_ssse3: Using AVX2 optimized SHA-256 implementation Feb 21 21:38:07 localhost.localdomain kernel: futex hash table entries: 65536 (order: 10, 4194304 bytes) Feb 21 21:38:07 localhost.localdomain kernel: Initialise system trusted keyring Feb 21 21:38:07 localhost.localdomain kernel: audit: initializing netlink socket (disabled) Feb 21 21:38:07 localhost.localdomain kernel: type=2000 audit(1550813879.361:1): initialized Feb 21 21:38:07 localhost.localdomain kernel: HugeTLB registered 1 GB page size, pre-allocated 0 pages Feb 21 21:38:07 localhost.localdomain kernel: HugeTLB registered 2 MB page size, pre-allocated 0 pages Feb 21 21:38:07 localhost.localdomain kernel: zpool: loaded Feb 21 21:38:07 localhost.localdomain kernel: zbud: loaded Feb 21 21:38:07 localhost.localdomain kernel: VFS: Disk quotas dquot_6.5.2 Feb 21 21:38:07 localhost.localdomain kernel: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Feb 21 21:38:07 localhost.localdomain kernel: msgmni has been set to 32768 Feb 21 21:38:07 localhost.localdomain kernel: Key type big_key registered Feb 21 21:38:07 localhost.localdomain kernel: SELinux: Registering netfilter hooks Feb 21 21:38:07 localhost.localdomain kernel: NET: Registered protocol family 38 Feb 21 21:38:07 localhost.localdomain kernel: Key type asymmetric registered Feb 21 21:38:07 localhost.localdomain kernel: Asymmetric key parser 'x509' registered Feb 21 21:38:07 localhost.localdomain kernel: Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248) Feb 21 21:38:07 localhost.localdomain kernel: io scheduler noop registered Feb 21 21:38:07 localhost.localdomain kernel: io scheduler deadline registered (default) Feb 21 21:38:07 localhost.localdomain kernel: io scheduler cfq registered Feb 21 21:38:07 localhost.localdomain kernel: io scheduler mq-deadline registered Feb 21 21:38:07 localhost.localdomain kernel: io scheduler kyber registered Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:01.0: irq 25 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:02.0: irq 26 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:03.0: irq 27 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:1c.0: irq 28 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:1c.7: irq 29 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:80:01.0: irq 31 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:80:02.0: irq 32 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pcie_pme 0000:00:01.0:pcie001: service driver pcie_pme loaded Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:02.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pcie_pme 0000:00:02.0:pcie001: service driver pcie_pme loaded Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:03.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:01:00.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pcie_pme 0000:00:03.0:pcie001: service driver pcie_pme loaded Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:1c.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pcie_pme 0000:00:1c.0:pcie001: service driver pcie_pme loaded Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:00:1c.7: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:05:00.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:06:00.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:07:00.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:08:00.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pcie_pme 0000:00:1c.7:pcie001: service driver pcie_pme loaded Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:80:01.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pci 0000:81:00.1: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pcie_pme 0000:80:01.0:pcie001: service driver pcie_pme loaded Feb 21 21:38:07 localhost.localdomain kernel: pcieport 0000:80:02.0: Signaling PME through PCIe PME interrupt Feb 21 21:38:07 localhost.localdomain kernel: pcie_pme 0000:80:02.0:pcie001: service driver pcie_pme loaded Feb 21 21:38:07 localhost.localdomain kernel: pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Feb 21 21:38:07 localhost.localdomain kernel: pciehp: PCI Express Hot Plug Controller Driver version: 0.4 Feb 21 21:38:07 localhost.localdomain kernel: shpchp 0000:07:00.0: Cannot get control of SHPC hotplug Feb 21 21:38:07 localhost.localdomain kernel: shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 Feb 21 21:38:07 localhost.localdomain kernel: intel_idle: MWAIT substates: 0x2120 Feb 21 21:38:07 localhost.localdomain kernel: intel_idle: v0.4.1 model 0x4F Feb 21 21:38:07 localhost.localdomain kernel: intel_idle: lapic_timer_reliable_states 0xffffffff Feb 21 21:38:07 localhost.localdomain kernel: input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 Feb 21 21:38:07 localhost.localdomain kernel: ACPI: Power Button [PWRF] Feb 21 21:38:07 localhost.localdomain kernel: ERST: Error Record Serialization Table (ERST) support is initialized. Feb 21 21:38:07 localhost.localdomain kernel: pstore: Registered erst as persistent store backend Feb 21 21:38:07 localhost.localdomain kernel: GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. Feb 21 21:38:07 localhost.localdomain kernel: Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled Feb 21 21:38:07 localhost.localdomain kernel: 00:02: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Feb 21 21:38:07 localhost.localdomain kernel: 00:03: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A Feb 21 21:38:07 localhost.localdomain kernel: Non-volatile memory driver v1.3 Feb 21 21:38:07 localhost.localdomain kernel: Linux agpgart interface v0.103 Feb 21 21:38:07 localhost.localdomain kernel: crash memory driver: version 1.1 Feb 21 21:38:07 localhost.localdomain kernel: rdac: device handler registered Feb 21 21:38:07 localhost.localdomain kernel: hp_sw: device handler registered Feb 21 21:38:07 localhost.localdomain kernel: emc: device handler registered Feb 21 21:38:07 localhost.localdomain kernel: alua: device handler registered Feb 21 21:38:07 localhost.localdomain kernel: libphy: Fixed MDIO Bus: probed Feb 21 21:38:07 localhost.localdomain kernel: ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci: EHCI PCI platform driver Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1a.0: EHCI Host Controller Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 1 Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1a.0: debug port 2 Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1a.0: cache line size of 32 is not supported Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1a.0: irq 18, io mem 0x93a12000 Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1a.0: USB 2.0 started, EHCI 1.00 Feb 21 21:38:07 localhost.localdomain kernel: usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 Feb 21 21:38:07 localhost.localdomain kernel: usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 Feb 21 21:38:07 localhost.localdomain kernel: usb usb1: Product: EHCI Host Controller Feb 21 21:38:07 localhost.localdomain kernel: usb usb1: Manufacturer: Linux 3.10.0-957.5.1.el7.x86_64 ehci_hcd Feb 21 21:38:07 localhost.localdomain kernel: usb usb1: SerialNumber: 0000:00:1a.0 Feb 21 21:38:07 localhost.localdomain kernel: hub 1-0:1.0: USB hub found Feb 21 21:38:07 localhost.localdomain kernel: hub 1-0:1.0: 2 ports detected Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1d.0: EHCI Host Controller Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1d.0: new USB bus registered, assigned bus number 2 Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1d.0: debug port 2 Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1d.0: cache line size of 32 is not supported Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1d.0: irq 18, io mem 0x93a11000 Feb 21 21:38:07 localhost.localdomain kernel: ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00 Feb 21 21:38:07 localhost.localdomain kernel: usb usb2: New USB device found, idVendor=1d6b, idProduct=0002 Feb 21 21:38:07 localhost.localdomain kernel: usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 Feb 21 21:38:07 localhost.localdomain kernel: usb usb2: Product: EHCI Host Controller Feb 21 21:38:07 localhost.localdomain kernel: usb usb2: Manufacturer: Linux 3.10.0-957.5.1.el7.x86_64 ehci_hcd Feb 21 21:38:07 localhost.localdomain kernel: usb usb2: SerialNumber: 0000:00:1d.0 Feb 21 21:38:07 localhost.localdomain kernel: hub 2-0:1.0: USB hub found Feb 21 21:38:07 localhost.localdomain kernel: hub 2-0:1.0: 2 ports detected Feb 21 21:38:07 localhost.localdomain kernel: ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver Feb 21 21:38:07 localhost.localdomain kernel: ohci-pci: OHCI PCI platform driver Feb 21 21:38:07 localhost.localdomain kernel: uhci_hcd: USB Universal Host Controller Interface driver Feb 21 21:38:07 localhost.localdomain kernel: xhci_hcd 0000:00:14.0: xHCI Host Controller Feb 21 21:38:07 localhost.localdomain kernel: xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 3 Feb 21 21:38:07 localhost.localdomain kernel: xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x00009810 Feb 21 21:38:07 localhost.localdomain kernel: xhci_hcd 0000:00:14.0: cache line size of 32 is not supported Feb 21 21:38:07 localhost.localdomain kernel: xhci_hcd 0000:00:14.0: irq 33 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: usb usb3: New USB device found, idVendor=1d6b, idProduct=0002 Feb 21 21:38:07 localhost.localdomain kernel: usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 Feb 21 21:38:07 localhost.localdomain kernel: usb usb3: Product: xHCI Host Controller Feb 21 21:38:07 localhost.localdomain kernel: usb usb3: Manufacturer: Linux 3.10.0-957.5.1.el7.x86_64 xhci-hcd Feb 21 21:38:07 localhost.localdomain kernel: usb usb3: SerialNumber: 0000:00:14.0 Feb 21 21:38:07 localhost.localdomain kernel: hub 3-0:1.0: USB hub found Feb 21 21:38:07 localhost.localdomain kernel: hub 3-0:1.0: 15 ports detected Feb 21 21:38:07 localhost.localdomain kernel: xhci_hcd 0000:00:14.0: xHCI Host Controller Feb 21 21:38:07 localhost.localdomain kernel: xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 4 Feb 21 21:38:07 localhost.localdomain kernel: usb usb4: New USB device found, idVendor=1d6b, idProduct=0003 Feb 21 21:38:07 localhost.localdomain kernel: usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 Feb 21 21:38:07 localhost.localdomain kernel: usb usb4: Product: xHCI Host Controller Feb 21 21:38:07 localhost.localdomain kernel: usb usb4: Manufacturer: Linux 3.10.0-957.5.1.el7.x86_64 xhci-hcd Feb 21 21:38:07 localhost.localdomain kernel: usb usb4: SerialNumber: 0000:00:14.0 Feb 21 21:38:07 localhost.localdomain kernel: hub 4-0:1.0: USB hub found Feb 21 21:38:07 localhost.localdomain kernel: hub 4-0:1.0: 6 ports detected Feb 21 21:38:07 localhost.localdomain kernel: usbcore: registered new interface driver usbserial_generic Feb 21 21:38:07 localhost.localdomain kernel: usbserial: USB Serial support registered for generic Feb 21 21:38:07 localhost.localdomain kernel: i8042: PNP: No PS/2 controller found. Probing ports directly. Feb 21 21:38:07 localhost.localdomain kernel: usb 1-1: new high-speed USB device number 2 using ehci-pci Feb 21 21:38:07 localhost.localdomain kernel: tsc: Refined TSC clocksource calibration: 2394.454 MHz Feb 21 21:38:07 localhost.localdomain kernel: i8042: No controller found Feb 21 21:38:07 localhost.localdomain kernel: usb 2-1: new high-speed USB device number 2 using ehci-pci Feb 21 21:38:07 localhost.localdomain kernel: mousedev: PS/2 mouse device common for all mice Feb 21 21:38:07 localhost.localdomain kernel: rtc_cmos 00:00: RTC can wake from S4 Feb 21 21:38:07 localhost.localdomain kernel: rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0 Feb 21 21:38:07 localhost.localdomain kernel: rtc_cmos 00:00: alarms up to one month, y3k, 114 bytes nvram, hpet irqs Feb 21 21:38:07 localhost.localdomain kernel: intel_pstate: Intel P-state driver initializing Feb 21 21:38:07 localhost.localdomain kernel: cpuidle: using governor menu Feb 21 21:38:07 localhost.localdomain kernel: hidraw: raw HID events driver (C) Jiri Kosina Feb 21 21:38:07 localhost.localdomain kernel: usbcore: registered new interface driver usbhid Feb 21 21:38:07 localhost.localdomain kernel: usbhid: USB HID core driver Feb 21 21:38:07 localhost.localdomain kernel: drop_monitor: Initializing network drop monitor service Feb 21 21:38:07 localhost.localdomain kernel: TCP: cubic registered Feb 21 21:38:07 localhost.localdomain kernel: Initializing XFRM netlink socket Feb 21 21:38:07 localhost.localdomain kernel: NET: Registered protocol family 10 Feb 21 21:38:07 localhost.localdomain kernel: NET: Registered protocol family 17 Feb 21 21:38:07 localhost.localdomain kernel: mpls_gso: MPLS GSO support Feb 21 21:38:07 localhost.localdomain kernel: intel_rdt: Intel RDT L3 allocation detected Feb 21 21:38:07 localhost.localdomain kernel: intel_rdt: Intel RDT L3DATA allocation detected Feb 21 21:38:07 localhost.localdomain kernel: intel_rdt: Intel RDT L3CODE allocation detected Feb 21 21:38:07 localhost.localdomain kernel: intel_rdt: Intel RDT L3 monitoring detected Feb 21 21:38:07 localhost.localdomain kernel: microcode: sig=0x406f1, pf=0x1, revision=0xb00002e Feb 21 21:38:07 localhost.localdomain kernel: microcode: Microcode Update Driver: v2.01 , Peter Oruba Feb 21 21:38:07 localhost.localdomain kernel: PM: Hibernation image not present or could not be loaded. Feb 21 21:38:07 localhost.localdomain kernel: Loading compiled-in X.509 certificates Feb 21 21:38:07 localhost.localdomain kernel: Loaded X.509 cert 'CentOS Linux kpatch signing key: ea0413152cde1d98ebdca3fe6f0230904c9ef717' Feb 21 21:38:07 localhost.localdomain kernel: Loaded X.509 cert 'CentOS Linux Driver update signing key: 7f421ee0ab69461574bb358861dbe77762a4201b' Feb 21 21:38:07 localhost.localdomain kernel: Loaded X.509 cert 'CentOS Linux kernel signing key: 9db78ad7c3e3338cdb7a0d8a8d08f880b4148d5c' Feb 21 21:38:07 localhost.localdomain kernel: registered taskstats version 1 Feb 21 21:38:07 localhost.localdomain kernel: Key type trusted registered Feb 21 21:38:07 localhost.localdomain kernel: Key type encrypted registered Feb 21 21:38:07 localhost.localdomain kernel: IMA: No TPM chip found, activating TPM-bypass! (rc=-19) Feb 21 21:38:07 localhost.localdomain kernel: Magic number: 7:736:618 Feb 21 21:38:07 localhost.localdomain kernel: rtc_cmos 00:00: setting system clock to 2019-02-22 05:38:06 UTC (1550813886) Feb 21 21:38:07 localhost.localdomain kernel: Switched to clocksource tsc Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4: new high-speed USB device number 2 using xhci_hcd Feb 21 21:38:07 localhost.localdomain kernel: usb 1-1: New USB device found, idVendor=8087, idProduct=800a Feb 21 21:38:07 localhost.localdomain kernel: usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 Feb 21 21:38:07 localhost.localdomain kernel: hub 1-1:1.0: USB hub found Feb 21 21:38:07 localhost.localdomain kernel: hub 1-1:1.0: 6 ports detected Feb 21 21:38:07 localhost.localdomain kernel: usb 2-1: New USB device found, idVendor=8087, idProduct=8002 Feb 21 21:38:07 localhost.localdomain kernel: usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 Feb 21 21:38:07 localhost.localdomain kernel: hub 2-1:1.0: USB hub found Feb 21 21:38:07 localhost.localdomain kernel: hub 2-1:1.0: 8 ports detected Feb 21 21:38:07 localhost.localdomain kernel: Freeing unused kernel memory: 1876k freed Feb 21 21:38:07 localhost.localdomain kernel: Write protecting the kernel read-only data: 12288k Feb 21 21:38:07 localhost.localdomain kernel: Freeing unused kernel memory: 516k freed Feb 21 21:38:07 localhost.localdomain kernel: Freeing unused kernel memory: 600k freed Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4: New USB device found, idVendor=413c, idProduct=a001 Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4: Product: Gadget USB HUB Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4: Manufacturer: no manufacturer Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4: SerialNumber: 0123456789 Feb 21 21:38:07 localhost.localdomain kernel: hub 3-4:1.0: USB hub found Feb 21 21:38:07 localhost.localdomain kernel: hub 3-4:1.0: 6 ports detected Feb 21 21:38:07 localhost.localdomain systemd[1]: systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN) Feb 21 21:38:07 localhost.localdomain systemd[1]: Detected architecture x86-64. Feb 21 21:38:07 localhost.localdomain systemd[1]: Running in initial RAM disk. Feb 21 21:38:07 localhost.localdomain systemd[1]: Set hostname to . Feb 21 21:38:07 localhost.localdomain systemd[1]: Reached target Timers. Feb 21 21:38:07 localhost.localdomain systemd[1]: Reached target Local File Systems. Feb 21 21:38:07 localhost.localdomain systemd[1]: Reached target Swap. Feb 21 21:38:07 localhost.localdomain systemd[1]: Created slice Root Slice. Feb 21 21:38:07 localhost.localdomain systemd[1]: Listening on udev Kernel Socket. Feb 21 21:38:07 localhost.localdomain systemd[1]: Created slice System Slice. Feb 21 21:38:07 localhost.localdomain systemd[1]: Reached target Slices. Feb 21 21:38:07 localhost.localdomain systemd[1]: Listening on Journal Socket. Feb 21 21:38:07 localhost.localdomain systemd[1]: Starting Create list of required static device nodes for the current kernel... Feb 21 21:38:07 localhost.localdomain systemd[1]: Starting Apply Kernel Variables... Feb 21 21:38:07 localhost.localdomain systemd[1]: Starting dracut cmdline hook... Feb 21 21:38:07 localhost.localdomain systemd[1]: Listening on udev Control Socket. Feb 21 21:38:07 localhost.localdomain systemd[1]: Reached target Sockets. Feb 21 21:38:07 localhost.localdomain systemd[1]: Started Dispatch Password Requests to Console Directory Watch. Feb 21 21:38:07 localhost.localdomain systemd[1]: Reached target Paths. Feb 21 21:38:07 localhost.localdomain systemd[1]: Starting Journal Service... Feb 21 21:38:07 localhost.localdomain systemd[1]: Started Create list of required static device nodes for the current kernel. Feb 21 21:38:07 localhost.localdomain systemd[1]: Started Apply Kernel Variables. Feb 21 21:38:07 localhost.localdomain systemd[1]: Starting Create Static Device Nodes in /dev... Feb 21 21:38:07 localhost.localdomain systemd[1]: Started Journal Service. Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4.3: new high-speed USB device number 3 using xhci_hcd Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4.3: New USB device found, idVendor=413c, idProduct=a102 Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4.3: Product: iDRAC Virtual NIC USB Device Feb 21 21:38:07 localhost.localdomain kernel: usb 3-4.3: Manufacturer: Dell(TM) Feb 21 21:38:07 localhost.localdomain kernel: pps_core: LinuxPPS API ver. 1 registered Feb 21 21:38:07 localhost.localdomain kernel: pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti Feb 21 21:38:07 localhost.localdomain kernel: libata version 3.00 loaded. Feb 21 21:38:07 localhost.localdomain kernel: PTP clock support registered Feb 21 21:38:07 localhost.localdomain kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:07 localhost.localdomain kernel: mlx_compat: loading out-of-tree module taints kernel. Feb 21 21:38:07 localhost.localdomain kernel: dca service started, version 1.12.1 Feb 21 21:38:07 localhost.localdomain kernel: mlx_compat: module verification failed: signature and/or required key missing - tainting kernel Feb 21 21:38:07 localhost.localdomain kernel: ahci 0000:00:1f.2: version 3.0 Feb 21 21:38:07 localhost.localdomain kernel: ahci 0000:00:1f.2: irq 34 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ahci 0000:00:1f.2: SSS flag set, parallel bus scan disabled Feb 21 21:38:07 localhost.localdomain kernel: Compat-mlnx-ofed backport release: b4fdfac Feb 21 21:38:07 localhost.localdomain kernel: Backport based on mlnx_ofed/mlnx-ofa_kernel-4.0.git b4fdfac Feb 21 21:38:07 localhost.localdomain kernel: compat.git: mlnx_ofed/mlnx-ofa_kernel-4.0.git Feb 21 21:38:07 localhost.localdomain kernel: ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x3f impl SATA mode Feb 21 21:38:07 localhost.localdomain kernel: ahci 0000:00:1f.2: flags: 64bit ncq stag led clo pio slum part ems apst Feb 21 21:38:07 localhost.localdomain kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:07 localhost.localdomain kernel: scsi host0: ahci Feb 21 21:38:07 localhost.localdomain kernel: scsi host1: ahci Feb 21 21:38:07 localhost.localdomain kernel: scsi host2: ahci Feb 21 21:38:07 localhost.localdomain kernel: scsi host3: ahci Feb 21 21:38:07 localhost.localdomain kernel: scsi host4: ahci Feb 21 21:38:07 localhost.localdomain kernel: scsi host5: ahci Feb 21 21:38:07 localhost.localdomain kernel: ata1: SATA max UDMA/133 abar m2048@0x93a10000 port 0x93a10100 irq 34 Feb 21 21:38:07 localhost.localdomain kernel: ata2: SATA max UDMA/133 abar m2048@0x93a10000 port 0x93a10180 irq 34 Feb 21 21:38:07 localhost.localdomain kernel: ata3: SATA max UDMA/133 abar m2048@0x93a10000 port 0x93a10200 irq 34 Feb 21 21:38:07 localhost.localdomain kernel: ata4: SATA max UDMA/133 abar m2048@0x93a10000 port 0x93a10280 irq 34 Feb 21 21:38:07 localhost.localdomain kernel: ata5: SATA max UDMA/133 abar m2048@0x93a10000 port 0x93a10300 irq 34 Feb 21 21:38:07 localhost.localdomain kernel: ata6: SATA max UDMA/133 abar m2048@0x93a10000 port 0x93a10380 irq 34 Feb 21 21:38:07 localhost.localdomain kernel: ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 5.1.0-k-rh7.6 Feb 21 21:38:07 localhost.localdomain kernel: ixgbe: Copyright (c) 1999-2016 Intel Corporation. Feb 21 21:38:07 localhost.localdomain kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:07 localhost.localdomain kernel: mlx5_core 0000:01:00.0: firmware version: 12.24.1000 Feb 21 21:38:07 localhost.localdomain kernel: mlx5_core 0000:01:00.0: 126.016 Gb/s available PCIe bandwidth (8 GT/s x16 link) Feb 21 21:38:07 localhost.localdomain kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 21 21:38:07 localhost.localdomain kernel: ata1.00: ATA-10: INTEL SSDSC2BX200G4R, G201DL2B, max UDMA/133 Feb 21 21:38:07 localhost.localdomain kernel: ata1.00: 390721968 sectors, multi 1: LBA48 NCQ (depth 31/32) Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 37 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 38 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 39 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 40 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 41 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 42 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 43 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 44 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 45 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 46 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 47 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 48 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 49 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 50 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 51 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 52 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 53 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 54 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 55 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 56 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: irq 57 for MSI/MSI-X Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: Multiqueue Enabled: Rx Queue count = 20, Tx Queue count = 20 XDP Queue count = 0 Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: PCI Express bandwidth of 32GT/s available Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: (Speed:5.0GT/s, Width: x8, Encoding Loss:20%) Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: MAC: 2, PHY: 17, SFP+: 9, PBA No: FFFFFF-0FF Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: 7c:d3:0a:c1:7f:d8 Feb 21 21:38:07 localhost.localdomain kernel: ixgbe 0000:81:00.0: Intel(R) 10 Gigabit Network Connection Feb 21 21:38:07 localhost.localdomain kernel: ata1.00: configured for UDMA/133 Feb 21 21:38:07 localhost.localdomain kernel: scsi 0:0:0:0: Direct-Access ATA INTEL SSDSC2BX20 DL2B PQ: 0 ANSI: 5 Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 59 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 60 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 61 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 62 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 63 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 64 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 65 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 66 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 67 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 68 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 69 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 70 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 71 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 72 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 73 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 74 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 75 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 76 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 77 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 78 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: irq 79 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: Multiqueue Enabled: Rx Queue count = 20, Tx Queue count = 20 XDP Queue count = 0 Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: PCI Express bandwidth of 32GT/s available Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: (Speed:5.0GT/s, Width: x8, Encoding Loss:20%) Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: MAC: 2, PHY: 1, PBA No: FFFFFF-0FF Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: 7c:d3:0a:c1:7f:da Feb 21 21:38:09 localhost.localdomain kernel: ixgbe 0000:81:00.1: Intel(R) 10 Gigabit Network Connection Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 80 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 81 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 82 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 83 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 84 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 85 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 86 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 87 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 88 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 89 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 90 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 91 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 92 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 93 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 94 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 95 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 96 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 97 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 98 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 99 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 100 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 101 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 102 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: irq 103 for MSI/MSI-X Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: Port module event: module 0, Cable plugged Feb 21 21:38:09 localhost.localdomain kernel: ata1.00: Enabling discard_zeroes_data Feb 21 21:38:09 localhost.localdomain kernel: sd 0:0:0:0: [sda] 390721968 512-byte logical blocks: (200 GB/186 GiB) Feb 21 21:38:09 localhost.localdomain kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks Feb 21 21:38:09 localhost.localdomain kernel: sd 0:0:0:0: [sda] Write Protect is off Feb 21 21:38:09 localhost.localdomain kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Feb 21 21:38:09 localhost.localdomain kernel: sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Feb 21 21:38:09 localhost.localdomain kernel: ata1.00: Enabling discard_zeroes_data Feb 21 21:38:09 localhost.localdomain kernel: sda: sda1 sda2 sda3 Feb 21 21:38:09 localhost.localdomain kernel: ata1.00: Enabling discard_zeroes_data Feb 21 21:38:09 localhost.localdomain kernel: sd 0:0:0:0: [sda] Attached SCSI disk Feb 21 21:38:09 localhost.localdomain kernel: mlx5_core 0000:01:00.0: FW Tracer Owner Feb 21 21:38:09 localhost.localdomain kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:09 localhost.localdomain kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:09 localhost.localdomain kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:09 localhost.localdomain kernel: mlx5_ib: Mellanox Connect-IB Infiniband driver v4.5-1.0.1 Feb 21 21:38:09 localhost.localdomain kernel: mlx5_ib: Mellanox Connect-IB Infiniband driver v4.5-1.0.1 Feb 21 21:38:10 localhost.localdomain kernel: ata2: SATA link down (SStatus 0 SControl 300) Feb 21 21:38:10 localhost.localdomain kernel: random: crng init done Feb 21 21:38:10 localhost.localdomain kernel: ata3: SATA link down (SStatus 0 SControl 300) Feb 21 21:38:10 localhost.localdomain kernel: ata4: SATA link down (SStatus 0 SControl 300) Feb 21 21:38:10 localhost.localdomain kernel: ata5: SATA link down (SStatus 0 SControl 300) Feb 21 21:38:11 localhost.localdomain kernel: ata6: SATA link down (SStatus 0 SControl 300) Feb 21 21:38:11 localhost.localdomain kernel: SGI XFS with ACLs, security attributes, no debug enabled Feb 21 21:38:11 localhost.localdomain kernel: XFS (sda1): Mounting V5 Filesystem Feb 21 21:38:11 localhost.localdomain kernel: XFS (sda1): Ending clean mount Feb 21 21:38:12 sh-101-19.int systemd-journald[231]: Received SIGTERM from PID 1 (systemd). Feb 21 21:38:12 sh-101-19.int kernel: SELinux: Disabled at runtime. Feb 21 21:38:12 sh-101-19.int kernel: SELinux: Unregistering netfilter hooks Feb 21 21:38:12 sh-101-19.int kernel: type=1404 audit(1550813892.116:2): selinux=0 auid=4294967295 ses=4294967295 Feb 21 21:38:12 sh-101-19.int kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Feb 21 21:38:12 sh-101-19.int systemd[1]: Inserted module 'ip_tables' Feb 21 21:38:12 sh-101-19.int kernel: loop: module loaded Feb 21 21:38:12 sh-101-19.int kernel: EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: user_xattr Feb 21 21:38:13 sh-101-19.int kernel: ACPI Error: No handler for Region [SYSI] (ffff9bf6a9e956c0) [IPMI] (20130517/evregion-162) Feb 21 21:38:13 sh-101-19.int kernel: ACPI Error: Region IPMI (ID=7) has no handler (20130517/exfldio-305) Feb 21 21:38:13 sh-101-19.int kernel: ACPI Error: Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff9bf6a9e7e550), AE_NOT_EXIST (20130517/psparse-536) Feb 21 21:38:13 sh-101-19.int kernel: ACPI Error: Method parse/execution failed [\_SB_.PMI0._PMC] (Node ffff9bf6a9e7e4b0), AE_NOT_EXIST (20130517/psparse-536) Feb 21 21:38:13 sh-101-19.int kernel: ACPI Exception: AE_NOT_EXIST, Evaluating _PMC (20130517/power_meter-753) Feb 21 21:38:13 sh-101-19.int kernel: mei_me 0000:00:16.0: Device doesn't have valid ME Interface Feb 21 21:38:13 sh-101-19.int kernel: ipmi message handler version 39.2 Feb 21 21:38:13 sh-101-19.int kernel: ipmi device interface Feb 21 21:38:13 sh-101-19.int kernel: IPMI System Interface driver. Feb 21 21:38:13 sh-101-19.int kernel: ipmi_si ipmi_si.0: ipmi_platform: probing via SMBIOS Feb 21 21:38:13 sh-101-19.int kernel: ipmi_si: SMBIOS: io 0xca8 regsize 1 spacing 4 irq 10 Feb 21 21:38:13 sh-101-19.int kernel: ipmi_si: Adding SMBIOS-specified kcs state machine Feb 21 21:38:13 sh-101-19.int kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0 Feb 21 21:38:13 sh-101-19.int kernel: ipmi_si: Trying SMBIOS-specified kcs state machine at i/o address 0xca8, slave address 0x20, irq 10 Feb 21 21:38:13 sh-101-19.int kernel: input: PC Speaker as /devices/platform/pcspkr/input/input1 Feb 21 21:38:13 sh-101-19.int kernel: cdc_ether 3-4.3:1.0 eth0: register 'cdc_ether' at usb-0000:00:14.0-4.3, CDC Ethernet Device, 7c:d3:0a:c1:7f:dd Feb 21 21:38:13 sh-101-19.int kernel: usbcore: registered new interface driver cdc_ether Feb 21 21:38:13 sh-101-19.int kernel: ipmi_si ipmi_si.0: The BMC does not support setting the recv irq bit, compensating, but the BMC needs to be fixed. Feb 21 21:38:13 sh-101-19.int kernel: ipmi_si ipmi_si.0: Using irq 10 Feb 21 21:38:13 sh-101-19.int kernel: ipmi_si ipmi_si.0: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20) Feb 21 21:38:13 sh-101-19.int kernel: ipmi_si ipmi_si.0: IPMI kcs interface initialized Feb 21 21:38:13 sh-101-19.int kernel: [TTM] Zone kernel: Available graphics memory: 65870398 kiB Feb 21 21:38:13 sh-101-19.int kernel: [TTM] Zone dma32: Available graphics memory: 2097152 kiB Feb 21 21:38:13 sh-101-19.int kernel: [TTM] Initializing pool allocator Feb 21 21:38:13 sh-101-19.int kernel: [TTM] Initializing DMA pool allocator Feb 21 21:38:13 sh-101-19.int kernel: Adding 4194300k swap on /dev/sda3. Priority:-2 extents:1 across:4194300k SSFS Feb 21 21:38:13 sh-101-19.int kernel: fbcon: mgadrmfb (fb0) is primary device Feb 21 21:38:13 sh-101-19.int kernel: cryptd: max_cpu_qlen set to 1000 Feb 21 21:38:13 sh-101-19.int kernel: XFS (sda2): Mounting V5 Filesystem Feb 21 21:38:13 sh-101-19.int kernel: AVX2 version of gcm_enc/dec engaged. Feb 21 21:38:13 sh-101-19.int kernel: AES CTR mode by8 optimization enabled Feb 21 21:38:13 sh-101-19.int kernel: XFS (sda2): Ending clean mount Feb 21 21:38:13 sh-101-19.int kernel: alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni) Feb 21 21:38:13 sh-101-19.int kernel: alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni) Feb 21 21:38:13 sh-101-19.int kernel: Console: switching to colour frame buffer device 128x48 Feb 21 21:38:13 sh-101-19.int kernel: mgag200 0000:08:00.0: fb0: mgadrmfb frame buffer device Feb 21 21:38:13 sh-101-19.int kernel: kvm: disabled by bios Feb 21 21:38:13 sh-101-19.int kernel: dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.3) Feb 21 21:38:13 sh-101-19.int kernel: [drm] Initialized mgag200 1.0.0 20110418 for 0000:08:00.0 on minor 0 Feb 21 21:38:13 sh-101-19.int kernel: iTCO_vendor_support: vendor-support=0 Feb 21 21:38:13 sh-101-19.int kernel: iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11 Feb 21 21:38:13 sh-101-19.int kernel: iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS Feb 21 21:38:13 sh-101-19.int kernel: intel_rapl: Found RAPL domain package Feb 21 21:38:13 sh-101-19.int kernel: intel_rapl: Found RAPL domain dram Feb 21 21:38:13 sh-101-19.int kernel: intel_rapl: DRAM domain energy unit 15300pj Feb 21 21:38:13 sh-101-19.int kernel: intel_rapl: RAPL package 0 domain package locked by BIOS Feb 21 21:38:13 sh-101-19.int kernel: intel_rapl: Found RAPL domain package Feb 21 21:38:13 sh-101-19.int kernel: intel_rapl: Found RAPL domain dram Feb 21 21:38:13 sh-101-19.int kernel: intel_rapl: DRAM domain energy unit 15300pj Feb 21 21:38:13 sh-101-19.int kernel: intel_rapl: RAPL package 1 domain package locked by BIOS Feb 21 21:38:13 sh-101-19.int kernel: kvm: disabled by bios Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fa0 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fa0 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fa0 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f60 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fa8 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fa8 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fa8 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f71 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f71 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f71 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6faa Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6faa Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6faa Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fab Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fab Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fab Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fac Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fac Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fac Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fad Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fad Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6fad Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f68 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f79 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f6a Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f6b Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f6c Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6f6d Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6ffc Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6ffc Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6ffc Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6ffd Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6ffd Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6ffd Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6faf Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6faf Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Seeking for: PCI ID 8086:6faf Feb 21 21:38:13 sh-101-19.int kernel: EDAC MC0: Giving out device to 'sb_edac.c' 'Broadwell SrcID#0_Ha#0': DEV 0000:7f:12.0 Feb 21 21:38:13 sh-101-19.int kernel: EDAC MC1: Giving out device to 'sb_edac.c' 'Broadwell SrcID#1_Ha#0': DEV 0000:ff:12.0 Feb 21 21:38:13 sh-101-19.int kernel: EDAC sbridge: Ver: 1.1.2 Feb 21 21:38:14 sh-101-19.int kernel: kvm: disabled by bios Feb 21 21:38:14 sh-101-19.int kernel: type=1305 audit(1550813894.310:3): audit_pid=10586 old=0 auid=4294967295 ses=4294967295 res=1 Feb 21 21:38:14 sh-101-19.int kernel: RPC: Registered named UNIX socket transport module. Feb 21 21:38:14 sh-101-19.int kernel: RPC: Registered udp transport module. Feb 21 21:38:14 sh-101-19.int kernel: RPC: Registered tcp transport module. Feb 21 21:38:14 sh-101-19.int kernel: RPC: Registered tcp NFSv4.1 backchannel transport module. Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:15 sh-101-19.int kernel: FS-Cache: Loaded Feb 21 21:38:15 sh-101-19.int kernel: CacheFiles: Loaded Feb 21 21:38:15 sh-101-19.int kernel: FS-Cache: Cache "mycache" added (type cachefiles) Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:15 sh-101-19.int kernel: CacheFiles: File cache on loop0 registered Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:15 sh-101-19.int kernel: mlx5_core 0000:01:00.0: slow_pci_heuristic:5202:(pid 10991): Max link speed = 100000, PCI BW = 126016 Feb 21 21:38:15 sh-101-19.int kernel: mlx5_core 0000:01:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(64) RxCqeCmprss(0) Feb 21 21:38:15 sh-101-19.int kernel: mlx5_core 0000:01:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(64) RxCqeCmprss(0) Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:15 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:16 sh-101-19.int kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Feb 21 21:38:16 sh-101-19.int kernel: ixgbe 0000:81:00.0: registered PHC device on em1 Feb 21 21:38:16 sh-101-19.int kernel: IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready Feb 21 21:38:16 sh-101-19.int kernel: ixgbe 0000:81:00.0 em1: detected SFP+: 9 Feb 21 21:38:19 sh-101-19.int kernel: ixgbe 0000:81:00.0 em1: NIC Link is Up 1 Gbps, Flow Control: RX/TX Feb 21 21:38:19 sh-101-19.int kernel: IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready Feb 21 21:38:21 sh-101-19.int kernel: IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready Feb 21 21:38:21 sh-101-19.int kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready Feb 21 21:38:27 sh-101-19.int kernel: FS-Cache: Netfs 'nfs' registered for caching Feb 21 21:43:00 sh-101-19.int kernel: kvm: disabled by bios Feb 21 21:43:04 sh-101-19.int kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 20, npartitions: 2 Feb 21 21:43:04 sh-101-19.int kernel: alg: No test for adler32 (adler32-zlib) Feb 21 21:43:05 sh-101-19.int kernel: LNet: Using FastReg for registration Feb 21 21:43:05 sh-101-19.int kernel: Lustre: Lustre: Build Version: 2.12.0_srcc01 Feb 21 21:43:05 sh-101-19.int kernel: LNet: Added LNI 10.9.101.19@o2ib4 [8/256/0/180] Feb 21 21:43:05 sh-101-19.int kernel: LNetError: 93441:0:(api-ni.c:3146:lnet_dyn_del_ni()) net tcp not found Feb 21 21:44:22 sh-101-19.int kernel: Lustre: Mounted fir-client Feb 21 21:44:23 sh-101-19.int kernel: Lustre: Mounted oak-client Feb 21 21:44:23 sh-101-19.int kernel: fuse init (API version 7.22) Feb 21 21:44:50 sh-101-19.int kernel: Key type dns_resolver registered Feb 21 21:44:50 sh-101-19.int kernel: NFS: Registering the id_resolver key type Feb 21 21:44:50 sh-101-19.int kernel: Key type id_resolver registered Feb 21 21:44:50 sh-101-19.int kernel: Key type id_legacy registered Feb 21 22:22:26 sh-101-19.int kernel: perf: interrupt took too long (2515 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 Feb 21 22:28:34 sh-101-19.int kernel: perf: interrupt took too long (3144 > 3143), lowering kernel.perf_event_max_sample_rate to 63000 Feb 21 22:35:31 sh-101-19.int kernel: perf: interrupt took too long (3931 > 3930), lowering kernel.perf_event_max_sample_rate to 50000 Feb 21 23:20:49 sh-101-19.int kernel: perf: interrupt took too long (4914 > 4913), lowering kernel.perf_event_max_sample_rate to 40000 Feb 22 02:06:23 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 03:19:16 sh-101-19.int kernel: TECH PREVIEW: Overlay filesystem may not be fully supported. Please review provided documentation for limitations. Feb 22 03:19:16 sh-101-19.int kernel: squashfs: version 4.0 (2009/01/31) Phillip Lougher Feb 22 04:16:49 sh-101-19.int kernel: perf: interrupt took too long (6151 > 6142), lowering kernel.perf_event_max_sample_rate to 32000 Feb 22 10:16:23 sh-101-19.int kernel: perf: interrupt took too long (8534 > 7688), lowering kernel.perf_event_max_sample_rate to 23000 Feb 22 16:47:02 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 19:52:54 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 20:31:41 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 21:24:42 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 21:46:57 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 21:46:57 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 21:49:05 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 21:53:24 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 22:44:45 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 22 23:37:03 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 00:11:19 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 00:11:19 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 00:47:13 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 01:26:28 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 01:43:00 sh-101-19.int kernel: blastp invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Feb 23 01:43:00 sh-101-19.int kernel: blastp cpuset=step_batch mems_allowed=0-1 Feb 23 01:43:00 sh-101-19.int kernel: CPU: 15 PID: 124410 Comm: blastp Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Feb 23 01:43:00 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Feb 23 01:43:00 sh-101-19.int kernel: Call Trace: Feb 23 01:43:00 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Feb 23 01:43:00 sh-101-19.int kernel: [] dump_header+0x90/0x229 Feb 23 01:43:00 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Feb 23 01:43:00 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Feb 23 01:43:00 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Feb 23 01:43:00 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Feb 23 01:43:00 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Feb 23 01:43:00 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Feb 23 01:43:00 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Feb 23 01:43:00 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Feb 23 01:43:00 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Feb 23 01:43:00 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Feb 23 01:43:00 sh-101-19.int kernel: [] page_fault+0x28/0x30 Feb 23 01:43:00 sh-101-19.int kernel: Task in /slurm/uid_325192/job_38152136/step_batch/task_0 killed as a result of limit of /slurm/uid_325192/job_38152136/step_batch Feb 23 01:43:00 sh-101-19.int kernel: memory: usage 4096000kB, limit 4096000kB, failcnt 4591 Feb 23 01:43:00 sh-101-19.int kernel: memory+swap: usage 4096000kB, limit 4096000kB, failcnt 0 Feb 23 01:43:00 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Feb 23 01:43:00 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_325192/job_38152136/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Feb 23 01:43:00 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_325192/job_38152136/step_batch/task_0: cache:0KB rss:4096000KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:59776KB active_anon:4036224KB inactive_file:0KB active_file:0KB unevictable:0KB Feb 23 01:43:00 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Feb 23 01:43:00 sh-101-19.int kernel: [124397] 325192 124397 28684 810 13 0 0 slurm_script Feb 23 01:43:00 sh-101-19.int kernel: [124410] 325192 124410 1063989 1042926 2078 0 0 blastp Feb 23 01:43:00 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 124410 (blastp) score 1020 or sacrifice child Feb 23 01:43:00 sh-101-19.int kernel: Killed process 124410 (blastp) total-vm:4255956kB, anon-rss:4094064kB, file-rss:77640kB, shmem-rss:0kB Feb 23 02:07:46 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 02:19:53 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 02:40:59 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 02:55:42 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 03:58:34 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 04:03:47 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 05:19:28 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 23 08:02:36 sh-101-19.int kernel: SU2_CFD (193240): Using mlock ulimits for SHM_HUGETLB is deprecated Feb 25 11:39:50 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 11:55:44 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 11:58:05 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 12:12:56 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 12:16:46 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 12:39:40 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 14:52:06 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 14:52:47 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 14:56:34 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 15:11:34 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 15:13:01 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 17:24:58 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 18:40:37 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 18:57:20 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 19:28:24 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 19:43:54 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 20:35:58 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 21:19:33 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 25 21:38:14 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 26 06:49:16 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 26 10:44:47 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 26 10:54:17 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 26 13:58:42 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Feb 26 15:08:45 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:08:45 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551222225, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf81f6dcec0/0xcb737997ae14d99b lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51620f42 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:08:45 sh-101-19.int kernel: LustreError: 137661:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bfe19a68300) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:08:45 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 15:13:52 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:13:52 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551222532, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bee803ad7c0/0xcb737997aeb79b3f lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5162a601 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:13:52 sh-101-19.int kernel: LustreError: 97843:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4a5b39200) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:13:52 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 15:18:57 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:18:57 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551222837, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4d4f34140/0xcb737997af692401 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc516337fe expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:18:57 sh-101-19.int kernel: LustreError: 33619:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf0da4f2480) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:18:57 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 15:24:06 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:24:06 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551223146, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf56e7b8900/0xcb737997b00c567e lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5163d98a expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:24:06 sh-101-19.int kernel: LustreError: 14176:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4e36ac000) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:24:06 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 15:29:14 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:29:14 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551223454, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bee387233c0/0xcb737997b0af7f00 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51647352 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:29:14 sh-101-19.int kernel: LustreError: 180426:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf538b25ec0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:29:14 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 15:34:23 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:34:23 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551223763, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9be99be82400/0xcb737997b15fbff6 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51651a25 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:34:23 sh-101-19.int kernel: LustreError: 136584:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf319e735c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:34:23 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 15:39:32 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:39:32 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551224072, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf56e6f5100/0xcb737997b2047b7d lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5165b790 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:39:32 sh-101-19.int kernel: LustreError: 92513:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf50c2aeb40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:39:32 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 15:44:40 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:44:40 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551224380, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf533227740/0xcb737997b29b472d lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc516656c2 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:44:40 sh-101-19.int kernel: LustreError: 68812:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf422fdab40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:44:40 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 15:49:49 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:49:49 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551224689, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf556bea1c0/0xcb737997b3254adb lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5166f51b expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:49:49 sh-101-19.int kernel: LustreError: 37460:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3e2623a40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:49:49 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 15:54:55 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 15:54:55 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551224995, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9c03c7e29440/0xcb737997b39ef291 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51678a36 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 15:54:55 sh-101-19.int kernel: LustreError: 4229:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c04bde1ecc0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 15:54:55 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 16:00:04 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 16:00:04 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551225303, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9be5c6dbd100/0xcb737997b41bf0b7 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51682803 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 16:00:04 sh-101-19.int kernel: LustreError: 165589:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4e13c3080) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 16:00:04 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 16:10:14 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 16:10:14 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 16:10:14 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551225914, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9be82ff53180/0xcb737997b54700c1 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc516940b2 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 16:10:15 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 16:10:15 sh-101-19.int kernel: LustreError: 101038:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf49fae3500) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 16:10:15 sh-101-19.int kernel: LustreError: 101038:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 16:10:15 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 16:10:15 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 16:20:33 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 16:20:33 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 16:20:33 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551226532, 301s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf564bcf740/0xcb737997b69308b7 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc516a8996 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 16:20:33 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 16:20:33 sh-101-19.int kernel: LustreError: 7965:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4797120c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 16:20:33 sh-101-19.int kernel: LustreError: 7965:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 16:20:33 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 16:20:33 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 16:30:49 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 16:30:49 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 16:30:49 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551227149, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4b2217bc0/0xcb737997b7d65de2 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc516bc648 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 16:30:49 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 16:30:49 sh-101-19.int kernel: LustreError: 174194:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be75db27200) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 16:30:49 sh-101-19.int kernel: LustreError: 174194:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 16:30:49 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 16:30:49 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 16:41:01 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 16:41:01 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 16:41:01 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551227761, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3d824aac0/0xcb737997b91dc263 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc516ce87b expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 16:41:01 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 16:41:01 sh-101-19.int kernel: LustreError: 12465:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c0428624cc0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 16:41:01 sh-101-19.int kernel: LustreError: 12465:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 16:41:01 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 16:41:01 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 16:51:19 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 16:51:19 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 16:51:19 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551228379, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bee38720b40/0xcb737997ba4b6045 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc516e2cea expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 16:51:19 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 16:51:19 sh-101-19.int kernel: LustreError: 82423:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be995c338c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 16:51:19 sh-101-19.int kernel: LustreError: 82423:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 16:51:19 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 16:51:19 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 17:01:37 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 17:01:37 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 17:01:37 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551228997, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf50f27ba80/0xcb737997bba1fc31 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc516f708e expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 17:01:37 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 17:01:37 sh-101-19.int kernel: LustreError: 115453:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4bcf92300) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 17:01:37 sh-101-19.int kernel: LustreError: 115453:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 17:01:37 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 17:01:37 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 17:11:49 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 17:11:49 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 17:11:49 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551229609, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5137b0d80/0xcb737997bcae5933 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51709609 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 17:11:49 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 17:11:49 sh-101-19.int kernel: LustreError: 63836:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf52e3e8540) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 17:11:49 sh-101-19.int kernel: LustreError: 63836:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 17:11:49 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 17:11:49 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 17:22:02 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 17:22:02 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 17:22:02 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551230222, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf456289f80/0xcb737997bdbfc209 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5171bee8 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 17:22:02 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 17:22:02 sh-101-19.int kernel: LustreError: 178664:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf5353ca9c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 17:22:02 sh-101-19.int kernel: LustreError: 178664:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 17:22:02 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 17:22:02 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 17:32:19 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 17:32:19 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 17:32:19 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551230839, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf478773840/0xcb737997bece655b lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5172ff36 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 17:32:19 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 17:32:19 sh-101-19.int kernel: LustreError: 133305:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be63584ce40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 17:32:19 sh-101-19.int kernel: LustreError: 133305:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 17:32:19 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 17:32:19 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 17:42:33 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 17:42:33 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 17:42:33 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551231453, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf45eb9c380/0xcb737997c00966ce lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51742c7c expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 17:42:33 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 17:42:33 sh-101-19.int kernel: LustreError: 79279:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9beacde4bbc0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 17:42:33 sh-101-19.int kernel: LustreError: 79279:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 17:42:33 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 17:42:33 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 17:52:46 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 17:52:46 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 17:52:46 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551232066, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bffe0a2c5c0/0xcb737997c14a65f7 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc517550a7 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 17:52:46 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 17:52:46 sh-101-19.int kernel: LustreError: 105271:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf5d62b92c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 17:52:46 sh-101-19.int kernel: LustreError: 105271:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 17:52:46 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 17:52:46 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 18:03:01 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 18:03:01 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 18:03:01 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551232681, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bfb087d7500/0xcb737997c2a3f813 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc517686fa expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 18:03:01 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 18:03:01 sh-101-19.int kernel: LustreError: 113660:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c05f2eae480) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 18:03:01 sh-101-19.int kernel: LustreError: 113660:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 18:03:01 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 18:03:01 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 18:13:19 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 18:13:19 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 18:13:19 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551233299, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9c03de746540/0xcb737997c3f11507 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5177c8f3 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 18:13:19 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 18:13:19 sh-101-19.int kernel: LustreError: 136189:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c0355e22f00) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 18:13:19 sh-101-19.int kernel: LustreError: 136189:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 18:13:19 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 18:13:19 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 18:23:34 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 18:23:34 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 18:23:34 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551233914, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9c01ef38e540/0xcb737997c51ef7ca lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5178fc8a expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 18:23:34 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 18:23:34 sh-101-19.int kernel: LustreError: 82365:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bfae27b5bc0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 18:23:34 sh-101-19.int kernel: LustreError: 82365:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 18:23:34 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 18:23:34 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 18:33:50 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 18:33:50 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 18:33:50 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551234530, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4e6725c40/0xcb737997c6426e05 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc517a2eae expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 18:33:50 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 18:33:50 sh-101-19.int kernel: LustreError: 75199:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf55aab4900) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 18:33:50 sh-101-19.int kernel: LustreError: 75199:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 18:33:50 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 18:33:50 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 18:44:07 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 18:44:07 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 18:44:07 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551235147, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bed747fd340/0xcb737997c77a4125 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc517b6a2c expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 18:44:07 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 18:44:07 sh-101-19.int kernel: LustreError: 54542:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4b8f52b40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 18:44:07 sh-101-19.int kernel: LustreError: 54542:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 18:44:07 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 18:44:07 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 18:54:20 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 18:54:20 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 18:54:20 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551235760, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bff03f11440/0xcb737997c8a07f30 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc517c8f30 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 18:54:20 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 18:54:20 sh-101-19.int kernel: LustreError: 16905:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bea53396300) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 18:54:20 sh-101-19.int kernel: LustreError: 16905:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 18:54:20 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 18:54:20 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 19:04:35 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 19:04:35 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 19:04:35 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551236375, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9c04a7f5c800/0xcb737997c9b6dd9f lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc517dba15 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 19:04:35 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 19:04:35 sh-101-19.int kernel: LustreError: 154322:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bfaebae98c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 19:04:35 sh-101-19.int kernel: LustreError: 154322:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 19:04:35 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 19:04:35 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 19:14:50 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 19:14:50 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 19:14:50 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551236990, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf48f7972c0/0xcb737997cadad55a lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc517eebc9 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 19:14:50 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 19:14:50 sh-101-19.int kernel: LustreError: 94265:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c03d9aeccc0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 19:14:50 sh-101-19.int kernel: LustreError: 94265:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 19:14:50 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 19:14:50 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 19:25:09 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 19:25:09 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 19:25:09 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551237609, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf48faa5a00/0xcb737997cbf65b2d lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51802d0c expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 19:25:09 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 19:25:09 sh-101-19.int kernel: LustreError: 20827:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4fc628300) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 19:25:09 sh-101-19.int kernel: LustreError: 20827:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 19:25:09 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 19:25:09 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 19:35:25 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 19:35:25 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 19:35:25 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551238225, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9be5c7ba3cc0/0xcb737997ccebf58f lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51816113 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 19:35:25 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 19:35:25 sh-101-19.int kernel: LustreError: 28044:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be5f6b9ca80) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 19:35:25 sh-101-19.int kernel: LustreError: 28044:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 19:35:25 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 19:35:25 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 19:45:38 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 19:45:38 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 19:45:38 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551238838, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4f63b1440/0xcb737997ce0508fe lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51828bff expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 19:45:38 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 19:45:38 sh-101-19.int kernel: LustreError: 81479:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4b133ad80) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 19:45:38 sh-101-19.int kernel: LustreError: 81479:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 19:45:38 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 19:45:38 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 19:55:54 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 19:55:54 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 19:55:54 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551239454, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9be7d6743840/0xcb737997cf52a3f9 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5183c5c4 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 19:55:54 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 19:55:54 sh-101-19.int kernel: LustreError: 95505:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf560687440) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 19:55:54 sh-101-19.int kernel: LustreError: 95505:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 19:55:54 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 19:55:54 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 20:06:10 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 20:06:10 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 20:06:10 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551240070, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf47963b840/0xcb737997d093dce9 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5184fa7a expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 20:06:10 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 20:06:10 sh-101-19.int kernel: LustreError: 141819:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3e6bb0480) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 20:06:10 sh-101-19.int kernel: LustreError: 141819:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 20:06:10 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 20:06:10 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 20:16:22 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 20:16:22 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 20:16:22 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551240682, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5d338af40/0xcb737997d1e29f26 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5186268c expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 20:16:22 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 20:16:22 sh-101-19.int kernel: LustreError: 165325:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be7d6b849c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 20:16:22 sh-101-19.int kernel: LustreError: 165325:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 20:16:22 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 20:16:22 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 20:26:38 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 20:26:38 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 20:26:38 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551241298, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4efe57500/0xcb737997d3058818 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51875cca expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 20:26:38 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 20:26:38 sh-101-19.int kernel: LustreError: 181894:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c03bbff4a80) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 20:26:38 sh-101-19.int kernel: LustreError: 181894:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 20:26:38 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 20:26:38 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 20:36:53 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 20:36:53 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 20:36:53 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551241913, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf550214ec0/0xcb737997d43ccbfe lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc518891b8 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 20:36:54 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 20:36:54 sh-101-19.int kernel: LustreError: 163390:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4646d12c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 20:36:54 sh-101-19.int kernel: LustreError: 163390:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 20:36:54 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 20:36:54 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 20:47:10 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 20:47:10 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 20:47:10 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551242530, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4e7fcfbc0/0xcb737997d54b531f lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5189c7cc expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 20:47:10 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 20:47:10 sh-101-19.int kernel: LustreError: 152740:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4e13c3680) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 20:47:10 sh-101-19.int kernel: LustreError: 152740:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 20:47:10 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 20:47:10 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 20:57:23 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 20:57:23 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 20:57:23 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551243143, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9c03eeaddc40/0xcb737997d697a62b lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc518af478 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 20:57:23 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 20:57:23 sh-101-19.int kernel: LustreError: 105544:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf50b24fec0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 20:57:23 sh-101-19.int kernel: LustreError: 105544:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 20:57:23 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 20:57:23 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 21:07:39 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 21:07:39 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 21:07:39 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551243759, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3f1318900/0xcb737997d7e99e7f lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc518c2ac4 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 21:07:39 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 21:07:39 sh-101-19.int kernel: LustreError: 147131:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf986ec6540) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 21:07:39 sh-101-19.int kernel: LustreError: 147131:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 21:07:39 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 21:07:39 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 21:17:58 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 21:17:58 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 21:17:58 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551244378, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf545b45580/0xcb737997d94846a2 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc518d68fe expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 21:17:58 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 21:17:58 sh-101-19.int kernel: LustreError: 160017:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf43667a6c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 21:17:58 sh-101-19.int kernel: LustreError: 160017:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 21:17:58 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 21:17:58 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 21:28:19 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 21:28:19 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 21:28:19 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551244999, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf447a045c0/0xcb737997da9294b1 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc518eab52 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 21:28:19 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 21:28:19 sh-101-19.int kernel: LustreError: 43796:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4b133b8c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 21:28:19 sh-101-19.int kernel: LustreError: 43796:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 21:28:19 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 21:28:19 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 21:38:34 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 21:38:34 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 21:38:34 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551245614, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf557f4e0c0/0xcb737997dbd86156 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc518fdbd2 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 21:38:34 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 21:38:34 sh-101-19.int kernel: LustreError: 23759:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3d7217d40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 21:38:34 sh-101-19.int kernel: LustreError: 23759:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 21:38:34 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 21:38:34 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 21:48:49 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 21:48:49 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 21:48:49 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551246229, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5db26c380/0xcb737997dcffa62e lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51910dda expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 21:48:49 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 21:48:49 sh-101-19.int kernel: LustreError: 27952:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf42eff8840) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 21:48:49 sh-101-19.int kernel: LustreError: 27952:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 21:48:49 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 21:48:49 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 21:59:02 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 21:59:02 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 21:59:02 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551246842, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9c03eab78b40/0xcb737997de18b43a lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51923db9 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 21:59:02 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 21:59:02 sh-101-19.int kernel: LustreError: 67378:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4e5311080) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 21:59:02 sh-101-19.int kernel: LustreError: 67378:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 21:59:02 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 21:59:02 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 22:09:21 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 22:09:21 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 22:09:21 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551247461, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3d128af40/0xcb737997df571650 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc519379ca expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 22:09:21 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 22:09:21 sh-101-19.int kernel: LustreError: 102228:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4e13c3440) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 22:09:21 sh-101-19.int kernel: LustreError: 102228:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 22:09:21 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 22:09:21 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 22:19:35 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 22:19:35 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 22:19:35 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551248075, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bfff871cc80/0xcb737997e09cde80 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5194ad92 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 22:19:35 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 22:19:35 sh-101-19.int kernel: LustreError: 94398:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3203b7080) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 22:19:35 sh-101-19.int kernel: LustreError: 94398:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 22:19:35 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 22:19:35 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 22:29:49 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 22:29:49 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 22:29:49 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551248689, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9c036fa0ec00/0xcb737997e1dfa781 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc5195e003 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 22:29:49 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 22:29:49 sh-101-19.int kernel: LustreError: 13081:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf6a8e57c80) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 22:29:49 sh-101-19.int kernel: LustreError: 13081:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 22:29:49 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 22:29:49 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 22:40:06 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 22:40:06 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 22:40:06 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551249306, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bed3fcb9f80/0xcb737997e336bbaf lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51971935 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 22:40:06 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 22:40:06 sh-101-19.int kernel: LustreError: 6187:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bfae27b5740) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 22:40:06 sh-101-19.int kernel: LustreError: 6187:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 22:40:06 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 22:40:06 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 22:50:23 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 22:50:23 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 22:50:23 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551249923, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bfb6e24d580/0xcb737997e4971c31 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51985037 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 22:50:23 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 22:50:23 sh-101-19.int kernel: LustreError: 50233:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c04013bce40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 22:50:23 sh-101-19.int kernel: LustreError: 50233:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 22:50:23 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 22:50:23 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 23:00:40 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 23:00:40 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 23:00:40 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551250540, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3f43e3cc0/0xcb737997e605cfa2 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51998907 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 23:00:40 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 23:00:40 sh-101-19.int kernel: LustreError: 105283:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf55f658f00) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 23:00:40 sh-101-19.int kernel: LustreError: 105283:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 23:00:40 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 23:00:40 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 23:10:56 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 23:10:56 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 23:10:56 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551251156, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf95b3806c0/0xcb737997e7764bef lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc519abe7a expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 23:10:56 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 23:10:56 sh-101-19.int kernel: LustreError: 124860:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c05e7e12f00) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 23:10:56 sh-101-19.int kernel: LustreError: 124860:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 23:10:56 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 23:10:56 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 23:21:09 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 23:21:09 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 23:21:09 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551251769, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3f7a2f980/0xcb737997e8d37a04 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc519beed7 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 23:21:09 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 23:21:09 sh-101-19.int kernel: LustreError: 185939:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf5353cb140) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 23:21:09 sh-101-19.int kernel: LustreError: 185939:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 23:21:09 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 23:21:09 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 23:31:25 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 23:31:25 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 23:31:25 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551252385, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5d53d5c40/0xcb737997ea3cb869 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc519d240b expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 23:31:25 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 23:31:25 sh-101-19.int kernel: LustreError: 32281:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf46ab99a40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 23:31:25 sh-101-19.int kernel: LustreError: 32281:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 23:31:25 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 23:31:25 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 23:41:38 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 23:41:38 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 23:41:38 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551252998, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf447a01d40/0xcb737997eba32f0c lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc519e5636 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 23:41:38 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 23:41:38 sh-101-19.int kernel: LustreError: 38186:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c058c2ee6c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 23:41:38 sh-101-19.int kernel: LustreError: 38186:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 23:41:38 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 23:41:38 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 26 23:51:54 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 26 23:51:54 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 26 23:51:54 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551253614, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf2d2af1b00/0xcb737997ecff70ef lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc519f8cf2 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 26 23:51:54 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 26 23:51:54 sh-101-19.int kernel: LustreError: 184141:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9beaec9cafc0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 26 23:51:54 sh-101-19.int kernel: LustreError: 184141:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 26 23:51:54 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 26 23:51:55 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 00:02:08 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 00:02:08 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 00:02:08 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551254228, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5247a8900/0xcb737997ee6357c9 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51a0bfbe expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 00:02:09 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 00:02:09 sh-101-19.int kernel: LustreError: 178153:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf50c2af140) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 00:02:09 sh-101-19.int kernel: LustreError: 178153:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 00:02:09 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 00:02:09 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 00:12:24 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 00:12:24 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 00:12:24 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551254844, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf47963f2c0/0xcb737997efbd96ca lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51a1f634 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 00:12:24 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 00:12:24 sh-101-19.int kernel: LustreError: 168908:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be802afe3c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 00:12:24 sh-101-19.int kernel: LustreError: 168908:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 00:12:24 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 00:12:24 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 00:22:40 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 00:22:40 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 00:22:40 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551255460, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5d3acde80/0xcb737997f1196534 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51a32c33 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 00:22:40 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 00:22:40 sh-101-19.int kernel: LustreError: 168538:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4ddee2240) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 00:22:40 sh-101-19.int kernel: LustreError: 168538:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 00:22:40 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 00:22:40 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 00:32:56 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 00:32:56 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 00:32:56 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551256076, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf49bf16540/0xcb737997f296c146 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51a4630b expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 00:32:57 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 00:32:57 sh-101-19.int kernel: LustreError: 137113:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf92c684a80) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 00:32:57 sh-101-19.int kernel: LustreError: 137113:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 00:32:57 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 00:32:57 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 00:43:14 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 00:43:14 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 00:43:14 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551256694, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf553aef080/0xcb737997f408c0e6 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51a59a68 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 00:43:14 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 00:43:14 sh-101-19.int kernel: LustreError: 139090:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be7f26040c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 00:43:14 sh-101-19.int kernel: LustreError: 139090:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 00:43:14 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 00:43:14 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 00:53:33 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 00:53:33 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 00:53:33 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551257313, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf456f3e540/0xcb737997f5516951 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51a6d4b9 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 00:53:33 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 00:53:33 sh-101-19.int kernel: LustreError: 181271:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c03f6b36840) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 00:53:33 sh-101-19.int kernel: LustreError: 181271:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 00:53:33 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 00:53:33 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 01:03:48 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 01:03:48 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 01:03:48 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551257928, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5db226e40/0xcb737997f6b4896f lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51a809d8 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 01:03:48 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 01:03:48 sh-101-19.int kernel: LustreError: 25191:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf482adc240) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 01:03:48 sh-101-19.int kernel: LustreError: 25191:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 01:03:48 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 01:03:48 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 01:14:06 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 01:14:06 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 01:14:06 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551258546, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3573f5e80/0xcb737997f82b2c45 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51a942cb expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 01:14:06 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 01:14:06 sh-101-19.int kernel: LustreError: 69815:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf547e3e240) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 01:14:06 sh-101-19.int kernel: LustreError: 69815:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 01:14:06 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 01:14:06 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 01:24:23 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 01:24:23 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 01:24:23 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551259163, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5c7e56300/0xcb737997f998389f lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51aa7b7f expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 01:24:23 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 01:24:23 sh-101-19.int kernel: LustreError: 27647:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be7f2604240) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 01:24:23 sh-101-19.int kernel: LustreError: 27647:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 01:24:23 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 01:24:23 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 01:34:38 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 01:34:38 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 01:34:38 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551259778, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf44faaa1c0/0xcb737997fadb8c11 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51abae8a expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 01:34:38 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 01:34:38 sh-101-19.int kernel: LustreError: 23509:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf560687680) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 01:34:38 sh-101-19.int kernel: LustreError: 23509:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 01:34:38 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 01:34:38 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 01:44:51 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 01:44:51 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 01:44:51 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551260391, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bffe0a2f080/0xcb737997fc2465f3 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51acdf1f expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 01:44:51 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 01:44:51 sh-101-19.int kernel: LustreError: 23015:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf46bf409c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 01:44:51 sh-101-19.int kernel: LustreError: 23015:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 01:44:51 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 01:44:51 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 01:55:07 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 01:55:07 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 01:55:07 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551261007, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3e2a77080/0xcb737997fda4f2ea lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51ae1525 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 01:55:07 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 01:55:07 sh-101-19.int kernel: LustreError: 77740:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3f5ab0900) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 01:55:07 sh-101-19.int kernel: LustreError: 77740:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 01:55:07 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 01:55:07 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 02:05:23 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 02:05:23 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 02:05:23 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551261623, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf42676af40/0xcb737997ff229994 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51af4c74 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 02:05:24 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 02:05:24 sh-101-19.int kernel: LustreError: 56707:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3d122e600) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 02:05:24 sh-101-19.int kernel: LustreError: 56707:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 02:05:24 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 02:05:24 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 02:15:39 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 02:15:39 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 02:15:39 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551262239, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5c7e56780/0xcb737998009c1d5b lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51b0819a expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 02:15:39 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 02:15:39 sh-101-19.int kernel: LustreError: 32719:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3f5ab0f00) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 02:15:39 sh-101-19.int kernel: LustreError: 32719:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 02:15:39 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 02:15:39 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 02:25:55 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 02:25:55 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 02:25:55 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551262855, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9c03f4299200/0xcb7379980221b2e5 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51b1b89c expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 02:25:55 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 02:25:55 sh-101-19.int kernel: LustreError: 26644:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c0007f83200) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 02:25:55 sh-101-19.int kernel: LustreError: 26644:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 02:25:55 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 02:25:55 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 02:36:15 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 02:36:15 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 02:36:15 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551263475, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3e2a77500/0xcb73799803a8977a lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51b2f2a0 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 02:36:15 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 02:36:15 sh-101-19.int kernel: LustreError: 48079:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf40e6392c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 02:36:15 sh-101-19.int kernel: LustreError: 48079:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 02:36:15 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 02:36:15 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 02:46:30 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 02:46:30 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 02:46:30 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551264090, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf53924f080/0xcb737998052979d6 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51b426fb expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 02:46:30 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 02:46:30 sh-101-19.int kernel: LustreError: 63732:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be5f6b9dc80) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 02:46:30 sh-101-19.int kernel: LustreError: 63732:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 02:46:30 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 02:46:30 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 02:56:43 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 02:56:43 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 02:56:43 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551264703, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9be7d6745100/0xcb73799806a7af35 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51b55b10 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 02:56:43 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 02:56:43 sh-101-19.int kernel: LustreError: 78731:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3d0f4f980) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 02:56:43 sh-101-19.int kernel: LustreError: 78731:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 02:56:43 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 02:56:43 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 03:06:59 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 03:06:59 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 03:06:59 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551265319, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bee38725340/0xcb737998080be6f8 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51b690a6 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 03:06:59 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 03:06:59 sh-101-19.int kernel: LustreError: 9407:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be7f26046c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 03:06:59 sh-101-19.int kernel: LustreError: 9407:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 03:06:59 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 03:06:59 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 03:17:13 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 03:17:13 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 03:17:13 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551265933, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5457586c0/0xcb737998098fb86f lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51b7c3d4 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 03:17:13 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 03:17:13 sh-101-19.int kernel: LustreError: 153809:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be771a932c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 03:17:13 sh-101-19.int kernel: LustreError: 153809:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 03:17:13 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 03:17:13 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 03:27:26 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 03:27:26 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 03:27:26 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551266546, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf534eb2640/0xcb7379980afce5fb lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51b8f6a0 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 03:27:26 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 03:27:26 sh-101-19.int kernel: LustreError: 129299:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c03f6b36c00) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 03:27:26 sh-101-19.int kernel: LustreError: 129299:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 03:27:26 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 03:27:26 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 03:37:41 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 03:37:41 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 03:37:41 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551267161, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bfcabb36e40/0xcb7379980c8f3c8e lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51ba2b41 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 03:37:41 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 03:37:41 sh-101-19.int kernel: LustreError: 85929:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c04b7257500) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 03:37:41 sh-101-19.int kernel: LustreError: 85929:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 03:37:41 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 03:37:41 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 03:47:56 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 03:47:56 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 03:47:56 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551267776, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf44faa8000/0xcb7379980df93a62 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51bb6021 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 03:47:56 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 03:47:56 sh-101-19.int kernel: LustreError: 76528:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4e9e52d80) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 03:47:56 sh-101-19.int kernel: LustreError: 76528:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 03:47:56 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 03:47:56 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 03:58:13 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 03:58:13 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 03:58:13 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551268393, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf467382ac0/0xcb7379980f5d6807 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51bc972a expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 03:58:13 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 03:58:13 sh-101-19.int kernel: LustreError: 55722:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4d1f42180) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 03:58:13 sh-101-19.int kernel: LustreError: 55722:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 03:58:13 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 03:58:13 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 04:08:29 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 04:08:29 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 04:08:29 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551269009, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf506fba1c0/0xcb73799810deff67 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51bdcd5a expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 04:08:29 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 04:08:29 sh-101-19.int kernel: LustreError: 54716:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf44bf4b5c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 04:08:29 sh-101-19.int kernel: LustreError: 54716:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 04:08:29 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 04:08:29 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 04:18:46 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 04:18:46 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 04:18:46 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551269626, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3e13bf740/0xcb73799812652796 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51bf0512 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 04:18:46 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 04:18:47 sh-101-19.int kernel: LustreError: 193835:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be75d3fa9c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 04:18:47 sh-101-19.int kernel: LustreError: 193835:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 04:18:47 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 04:18:47 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 04:29:02 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 04:29:02 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 04:29:02 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551270242, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9be63514f500/0xcb73799813db5510 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51c03ab6 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 04:29:02 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 04:29:02 sh-101-19.int kernel: LustreError: 182729:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be5f6ac9500) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 04:29:02 sh-101-19.int kernel: LustreError: 182729:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 04:29:02 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 04:29:02 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 04:39:16 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 04:39:16 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 04:39:16 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551270856, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bec78f3c800/0xcb737998155faf46 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51c16e7e expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 04:39:16 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 04:39:16 sh-101-19.int kernel: LustreError: 174416:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3203b6fc0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 04:39:16 sh-101-19.int kernel: LustreError: 174416:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 04:39:16 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 04:39:16 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 04:49:31 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 04:49:31 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 04:49:31 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551271471, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3d128d7c0/0xcb73799816d1bca0 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51c2a453 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 04:49:31 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 04:49:31 sh-101-19.int kernel: LustreError: 189465:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3debf7440) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 04:49:31 sh-101-19.int kernel: LustreError: 189465:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 04:49:31 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 04:49:31 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 04:59:47 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 04:59:47 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 04:59:47 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551272087, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bebae965a00/0xcb7379981849130e lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51c3da75 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 04:59:47 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 04:59:47 sh-101-19.int kernel: LustreError: 50079:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf5297e7740) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 04:59:47 sh-101-19.int kernel: LustreError: 50079:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 04:59:47 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 04:59:47 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 05:10:03 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 05:10:03 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 05:10:03 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551272703, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf44faad580/0xcb73799819ac30a8 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51c5105f expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 05:10:03 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 05:10:03 sh-101-19.int kernel: LustreError: 62933:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4b8702c00) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 05:10:03 sh-101-19.int kernel: LustreError: 62933:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 05:10:03 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 05:10:03 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 05:20:18 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 05:20:18 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 05:20:18 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551273318, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4846cd580/0xcb7379981b2e04ae lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51c644b3 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 05:20:18 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 05:20:18 sh-101-19.int kernel: LustreError: 75748:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4766ff8c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 05:20:18 sh-101-19.int kernel: LustreError: 75748:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 05:20:18 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 05:20:18 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 05:30:32 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 05:30:32 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 05:30:32 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551273932, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bed747ff740/0xcb7379981cb1f50b lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51c779a8 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 05:30:32 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 05:30:32 sh-101-19.int kernel: LustreError: 165978:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4a5a100c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 05:30:32 sh-101-19.int kernel: LustreError: 165978:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 05:30:32 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 05:30:32 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 05:40:46 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 05:40:46 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 05:40:46 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551274546, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bfb2ce67bc0/0xcb7379981e33fa11 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51c8ade0 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 05:40:46 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 05:40:47 sh-101-19.int kernel: LustreError: 64150:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be801efa600) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 05:40:47 sh-101-19.int kernel: LustreError: 64150:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 05:40:47 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 05:40:47 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 05:51:02 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 05:51:02 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 05:51:02 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551275162, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9be7b669ee40/0xcb7379981f906465 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51c9e3f4 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 05:51:02 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 05:51:02 sh-101-19.int kernel: LustreError: 172026:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4766ff8c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 05:51:02 sh-101-19.int kernel: LustreError: 172026:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 05:51:02 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 05:51:02 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 06:01:17 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 06:01:17 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 06:01:17 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551275777, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3776be9c0/0xcb73799821148314 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51cb1913 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 06:01:17 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 06:01:17 sh-101-19.int kernel: LustreError: 110404:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4beb55380) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 06:01:17 sh-101-19.int kernel: LustreError: 110404:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 06:01:17 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 06:01:17 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 06:11:36 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 06:11:36 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 06:11:36 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551276396, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf5b1a51200/0xcb737998227ae718 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51cc5196 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 06:11:36 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 06:11:36 sh-101-19.int kernel: LustreError: 79985:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf56e633a40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 06:11:36 sh-101-19.int kernel: LustreError: 79985:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 06:11:36 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 06:11:36 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 06:21:52 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 06:21:52 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 06:21:52 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551277012, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3573f2f40/0xcb73799823dc5376 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51cd87e2 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 06:21:52 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 06:21:52 sh-101-19.int kernel: LustreError: 56971:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be802afecc0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 06:21:52 sh-101-19.int kernel: LustreError: 56971:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 06:21:52 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 06:21:52 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 06:32:06 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 06:32:06 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 06:32:06 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551277626, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf503292d00/0xcb7379982554103d lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51cebcd0 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 06:32:06 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 06:32:06 sh-101-19.int kernel: LustreError: 42479:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c0587116e40) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 06:32:06 sh-101-19.int kernel: LustreError: 42479:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 06:32:06 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 06:32:06 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 06:42:25 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 06:42:25 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 06:42:25 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551278245, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bfe89757080/0xcb73799826d95bfa lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51cff4ab expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 06:42:25 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 06:42:25 sh-101-19.int kernel: LustreError: 193961:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c039ff4b800) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 06:42:25 sh-101-19.int kernel: LustreError: 193961:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 06:42:25 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 06:42:25 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 06:52:39 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 06:52:39 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 06:52:39 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551278859, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf3ea667740/0xcb7379982862466c lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51d128f8 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 06:52:39 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 06:52:39 sh-101-19.int kernel: LustreError: 136559:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3fc36f680) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 06:52:39 sh-101-19.int kernel: LustreError: 136559:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 06:52:39 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 06:52:39 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 07:02:56 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 07:02:56 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 07:02:56 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551279476, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf2c52bd7c0/0xcb73799829efeea8 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51d25fd0 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 07:02:56 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 07:02:56 sh-101-19.int kernel: LustreError: 137096:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bebea6d0780) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 07:02:56 sh-101-19.int kernel: LustreError: 137096:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 07:02:56 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 07:02:56 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 07:13:09 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 07:13:09 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 07:13:09 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551280089, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf45eb9a400/0xcb7379982b5b7818 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51d3944e expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 07:13:09 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 07:13:09 sh-101-19.int kernel: LustreError: 127457:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf431f9a300) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 07:13:09 sh-101-19.int kernel: LustreError: 127457:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 07:13:09 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 07:13:09 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 07:23:26 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 07:23:26 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 07:23:26 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551280706, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9becffa5d7c0/0xcb7379982cdbf75c lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51d4cb3b expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 07:23:26 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 07:23:26 sh-101-19.int kernel: LustreError: 143394:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c03dda46300) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 07:23:26 sh-101-19.int kernel: LustreError: 143394:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 07:23:26 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 07:23:26 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 07:33:43 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 07:33:43 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 07:33:43 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551281323, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bee803ac380/0xcb7379982e332377 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51d60205 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 07:33:43 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 07:33:43 sh-101-19.int kernel: LustreError: 122201:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9be801efbec0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 07:33:43 sh-101-19.int kernel: LustreError: 122201:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 07:33:43 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 07:33:43 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 07:44:01 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 07:44:01 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 07:44:01 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551281941, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4fbb8b840/0xcb7379982fa217ee lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51d73931 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 07:44:01 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 07:44:01 sh-101-19.int kernel: LustreError: 43919:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4163f5140) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 07:44:01 sh-101-19.int kernel: LustreError: 43919:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 07:44:01 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 07:44:01 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 07:54:15 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 07:54:15 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 07:54:15 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551282555, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4de2bde80/0xcb737998310aea97 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51d86e57 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 07:54:16 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 07:54:16 sh-101-19.int kernel: LustreError: 112827:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3d7f9c3c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 07:54:16 sh-101-19.int kernel: LustreError: 112827:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 07:54:16 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 07:54:16 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 08:04:31 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 08:04:31 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 08:04:31 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551283170, 301s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9be771a0cc80/0xcb7379983272cdaa lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51d9a330 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 08:04:31 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 08:04:31 sh-101-19.int kernel: LustreError: 100390:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf44e7c8900) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 08:04:31 sh-101-19.int kernel: LustreError: 100390:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 08:04:31 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 08:04:31 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 08:14:50 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 08:14:50 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 08:14:50 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551283789, 301s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf502f70480/0xcb73799833cf04b9 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51dadb43 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 08:14:50 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 08:14:50 sh-101-19.int kernel: LustreError: 50441:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf3d6b07ec0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 08:14:50 sh-101-19.int kernel: LustreError: 50441:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 08:14:50 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 08:14:50 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 08:25:04 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 08:25:04 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 08:25:04 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551284404, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf4b2216c00/0xcb73799835316514 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51dc0f7b expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 08:25:04 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 08:25:04 sh-101-19.int kernel: LustreError: 28296:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf4a832a600) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 08:25:04 sh-101-19.int kernel: LustreError: 28296:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 08:25:04 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 08:25:04 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 08:35:20 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Feb 27 08:35:20 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Feb 27 08:35:20 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551285020, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9bf51e267500/0xcb737998369eeaf9 lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51dd45b2 expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 08:35:20 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 08:35:20 sh-101-19.int kernel: LustreError: 189107:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9c03b1ebacc0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 08:35:20 sh-101-19.int kernel: LustreError: 189107:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 08:35:20 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 08:35:20 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 08:42:02 sh-101-19.int kernel: Lustre: 93479:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551285676/real 1551285676] req@ffff9c03e9b0a400 x1626154109904736/t0(0) o400->MGC10.210.34.201@o2ib1@10.210.34.201@o2ib1:26/25 lens 224/224 e 0 to 1 dl 1551285720 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Feb 27 08:45:36 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551285636, 300s ago), entering recovery for MGS@MGC10.210.34.201@o2ib1_0 ns: MGC10.210.34.201@o2ib1 lock: ffff9c03d13ea640/0xcb737998380b3f5b lrc: 4/1,0 mode: --/CR res: [0x6c61676572:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x97b806fc51de7b6b expref: -99 pid: 93562 timeout: 0 lvb_type: 0 Feb 27 08:45:36 sh-101-19.int kernel: LustreError: 93562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 27 08:50:24 sh-101-19.int kernel: Lustre: Evicted from MGS (at MGC10.210.34.201@o2ib1_0) after server handle changed from 0x97b806fc5160fbbe to 0x5099a72de8d6fd7 Feb 27 08:50:24 sh-101-19.int kernel: LustreError: 70291:0:(ldlm_resource.c:1146:ldlm_resource_complain()) MGC10.210.34.201@o2ib1: namespace resource [0x6c61676572:0x2:0x0].0x0 (ffff9bf927fde600) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 27 08:50:24 sh-101-19.int kernel: LustreError: 70291:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 27 08:50:24 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Feb 27 08:50:24 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Feb 27 22:01:45 sh-101-19.int kernel: EXT4-fs (loop1): mounting ext3 file system using the ext4 subsystem Feb 27 22:01:45 sh-101-19.int kernel: EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: errors=remount-ro Feb 28 15:46:27 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Feb 28 15:49:34 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 01:38:15 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551691694/real 1551691694] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551692295 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 04 01:38:15 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 01:38:15 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 04 01:38:15 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 01:38:15 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 04 01:38:22 sh-101-19.int kernel: Lustre: 6279:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551691700/real 1551691700] req@ffff9bee62026f00 x1626157436440464/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 520/1752 e 5 to 1 dl 1551692302 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 04 01:38:22 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 01:48:13 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551692302/real 1551692302] req@ffff9c04be244b00 x1626157442838560/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1551692893 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 01:48:13 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 01:48:14 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 04 01:48:14 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 01:48:16 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551692295/real 1551692295] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551692896 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 04 01:48:16 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 01:58:05 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551692893/real 1551692893] req@ffff9c04be244b00 x1626157442838560/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1551693484 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 01:58:05 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 01:58:05 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 04 01:58:05 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 01:58:17 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551692896/real 1551692896] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551693497 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 01:58:17 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 02:07:56 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551693485/real 1551693485] req@ffff9c04be244b00 x1626157442838560/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1551694076 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 02:07:56 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 02:07:56 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 04 02:07:56 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 02:17:47 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551694076/real 1551694076] req@ffff9c04be244b00 x1626157442838560/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1551694667 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 02:17:47 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 02:17:47 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 02:17:47 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 02:17:47 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 04 02:17:47 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 02:27:38 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551694667/real 1551694667] req@ffff9c04be244b00 x1626157442838560/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1551695258 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 02:27:38 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 02:27:38 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 02:27:38 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 02:27:38 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 04 02:27:38 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 02:37:29 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551695258/real 1551695258] req@ffff9c04be244b00 x1626157442838560/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1551695849 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 02:37:29 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 02:37:29 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 02:37:29 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 02:37:29 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 04 02:37:29 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 02:47:20 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551695849/real 1551695849] req@ffff9c04be244b00 x1626157442838560/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1551696440 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 02:47:20 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 02:47:20 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 02:47:20 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 02:48:22 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 02:48:22 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 04 02:57:11 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551696440/real 1551696440] req@ffff9c04be244b00 x1626157442838560/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1551697031 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 02:57:11 sh-101-19.int kernel: Lustre: 20914:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 02:57:11 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 02:57:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 02:58:23 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 02:58:23 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:08:24 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551697103/real 1551697103] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551697704 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 03:08:24 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 04 03:08:24 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 03:08:25 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 04 03:08:25 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 03:08:25 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:18:26 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551697705/real 1551697705] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551698306 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 03:18:26 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 03:18:26 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 03:18:26 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:18:26 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 03:18:26 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:28:27 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551698306/real 1551698306] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551698907 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 03:28:27 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 03:28:27 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 03:28:27 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:28:27 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 03:28:27 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:38:28 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551698907/real 1551698907] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551699508 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 03:38:28 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 03:38:28 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 03:38:28 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:38:28 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 03:38:28 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:48:29 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551699508/real 1551699508] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551700109 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 03:48:29 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 03:48:29 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 03:48:29 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:48:29 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 03:48:29 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:58:30 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551700109/real 1551700109] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551700710 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 03:58:30 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 03:58:30 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 03:58:30 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 03:58:30 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 03:58:30 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:08:31 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551700710/real 1551700710] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551701311 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 04:08:31 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 04:08:31 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 04:08:31 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:08:31 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 04:08:31 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:18:32 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551701311/real 1551701311] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551701912 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 04:18:32 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 04:18:32 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 04:18:32 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:18:32 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 04:18:32 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:28:33 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551701912/real 1551701912] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551702513 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 04:28:33 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 04:28:33 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 04:28:33 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:28:33 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 04:28:33 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:38:34 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551702513/real 1551702513] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551703114 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 04:38:34 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 04:38:34 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 04:38:34 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:38:34 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 04:38:34 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:48:35 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551703114/real 1551703114] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551703715 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 04:48:35 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 04:48:35 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 04:48:35 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:48:35 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 04:48:35 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:58:36 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551703715/real 1551703715] req@ffff9be8f2d12d00 x1626157435913632/t0(0) o36->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 5 to 1 dl 1551704316 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 04 04:58:36 sh-101-19.int kernel: Lustre: 20910:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 04:58:36 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 04 04:58:36 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 04:58:36 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 04 04:58:36 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 04 05:01:15 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0002-mdc-ffff9c05b8776000: operation mds_reint to node 10.0.10.51@o2ib7 failed: rc = -19 Mar 04 05:01:34 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0001-mdc-ffff9c05b8776000: operation mds_reint to node 10.0.10.52@o2ib7 failed: rc = -19 Mar 04 05:02:42 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9be8f2d16c00 x1626157435890672/t60407812521(60407812521) o101->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 1160/560 e 0 to 0 dl 1551705318 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 04 05:03:45 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9bfa0d3efb00 x1626156389492544/t49396696132(49396696132) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1551705381 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 04 05:03:45 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 2 previous similar messages Mar 04 05:14:09 sh-101-19.int kernel: Lustre: 93476:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551704492/real 1551704492] req@ffff9bea2da58f00 x1626157444201712/t0(0) o400->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1551705248 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 04 05:14:09 sh-101-19.int kernel: Lustre: 93476:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 04 16:49:11 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 05 11:32:23 sh-101-19.int kernel: Lustre: 6279:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551813587/real 1551813587] req@ffff9be94cc78f00 x1626161573311296/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 520/1752 e 0 to 1 dl 1551814343 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 05 11:32:23 sh-101-19.int kernel: Lustre: 6279:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 05 11:32:23 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 05 11:32:23 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 05 11:32:23 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 05 11:32:23 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 05 11:44:59 sh-101-19.int kernel: Lustre: 6279:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551814343/real 1551814343] req@ffff9be94cc78f00 x1626161573311296/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 520/1752 e 0 to 1 dl 1551815099 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 05 11:44:59 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 05 11:44:59 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 05 11:57:35 sh-101-19.int kernel: Lustre: 6279:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551815099/real 1551815099] req@ffff9be94cc78f00 x1626161573311296/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 520/1752 e 0 to 1 dl 1551815855 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 05 11:57:35 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 05 11:57:35 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 05 12:02:59 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0003-mdc-ffff9c05b8776000: operation ldlm_enqueue to node 10.0.10.52@o2ib7 failed: rc = -19 Mar 05 12:02:59 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 05 12:02:59 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Mar 05 12:04:02 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9be8787fe300 x1626157724142096/t51542772977(51542772977) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1551816998 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 05 12:04:02 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 18 previous similar messages Mar 05 12:08:25 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0000-mdc-ffff9c05b8776000: operation ldlm_enqueue to node 10.0.10.51@o2ib7 failed: rc = -19 Mar 05 12:08:25 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Mar 05 12:09:02 sh-101-19.int kernel: LustreError: 131902:0:(file.c:4393:ll_inode_revalidate_fini()) fir: revalidate FID [0x200000007:0x1:0x0] error: rc = -4 Mar 05 12:11:04 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0000-mdc-ffff9c05b8776000: operation ldlm_enqueue to node 10.0.10.51@o2ib7 failed: rc = -19 Mar 05 12:16:38 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551816242/real 1551816242] req@ffff9bfdf3589500 x1626161615375120/t0(0) o400->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1551816998 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Mar 05 12:16:38 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 58684140190 was previously committed, server now claims 58464140959)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 05 12:16:38 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9bf5dbe11e00 x1626161422342240/t58464140664(58464140664) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1551817754 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 05 12:16:38 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 11 previous similar messages Mar 05 12:18:46 sh-101-19.int kernel: LustreError: 107045:0:(file.c:4393:ll_inode_revalidate_fini()) fir: revalidate FID [0x200000007:0x1:0x0] error: rc = -4 Mar 05 12:26:12 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 05 12:26:14 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 05 12:26:14 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 05 12:38:52 sh-101-19.int kernel: Lustre: 20908:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551817576/real 1551817576] req@ffff9bf44dda7b00 x1626161615533136/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1551818332 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 05 12:38:52 sh-101-19.int kernel: Lustre: 20908:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 05 12:38:52 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 05 12:38:52 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 05 12:38:52 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 58684140190 was previously committed, server now claims 58464140959)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 05 12:38:52 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 05 12:38:52 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 05 12:51:28 sh-101-19.int kernel: Lustre: 20908:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551818332/real 1551818332] req@ffff9bf44dda7b00 x1626161615533136/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1551819088 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 05 12:51:28 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 05 12:51:28 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 58684140190 was previously committed, server now claims 58464140959)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 05 12:51:28 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 05 13:04:04 sh-101-19.int kernel: Lustre: 20908:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551819088/real 1551819088] req@ffff9bf44dda7b00 x1626161615533136/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1551819844 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 05 13:04:04 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 05 13:04:04 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 58684140190 was previously committed, server now claims 58464140959)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 05 13:04:04 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 05 23:00:27 sh-101-19.int kernel: Lustre: 93485:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551855620/real 1551855620] req@ffff9c0507dfa400 x1626162349485776/t0(0) o103->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:17/18 lens 328/224 e 0 to 1 dl 1551855627 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 05 23:00:27 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 05 23:00:27 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 07 18:44:04 sh-101-19.int kernel: python invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 07 18:44:04 sh-101-19.int kernel: python cpuset=step_0 mems_allowed=0-1 Mar 07 18:44:04 sh-101-19.int kernel: CPU: 19 PID: 40340 Comm: python Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 07 18:44:04 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 07 18:44:04 sh-101-19.int kernel: Call Trace: Mar 07 18:44:04 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 07 18:44:04 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 07 18:44:04 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 07 18:44:04 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 07 18:44:04 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 07 18:44:04 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 07 18:44:04 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 07 18:44:04 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 07 18:44:04 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 07 18:44:04 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 07 18:44:04 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 07 18:44:04 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 07 18:44:04 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 07 18:44:04 sh-101-19.int kernel: Task in /slurm/uid_30356/job_38751303/step_0/task_0 killed as a result of limit of /slurm/uid_30356/job_38751303 Mar 07 18:44:04 sh-101-19.int kernel: memory: usage 15728640kB, limit 15728640kB, failcnt 19952 Mar 07 18:44:04 sh-101-19.int kernel: memory+swap: usage 15728640kB, limit 15728640kB, failcnt 0 Mar 07 18:44:04 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 07 18:44:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38751303: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 07 18:44:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38751303/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 07 18:44:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38751303/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 07 18:44:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38751303/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 07 18:44:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38751303/step_batch/task_0: cache:0KB rss:4484KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2296KB active_anon:2188KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 07 18:44:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38751303/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 07 18:44:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38751303/step_0/task_0: cache:0KB rss:15724156KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:413564KB active_anon:15310592KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 07 18:44:04 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 07 18:44:04 sh-101-19.int kernel: [39733] 0 39733 26988 88 10 0 0 sleep Mar 07 18:44:04 sh-101-19.int kernel: [39798] 30356 39798 28334 441 13 0 0 slurm_script Mar 07 18:44:04 sh-101-19.int kernel: [40021] 30356 40021 80977 1477 40 0 0 srun Mar 07 18:44:04 sh-101-19.int kernel: [40310] 30356 40310 13100 217 29 0 0 srun Mar 07 18:44:04 sh-101-19.int kernel: [40340] 30356 40340 5401570 3934224 10463 0 0 python Mar 07 18:44:04 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 40340 (python) score 1003 or sacrifice child Mar 07 18:44:04 sh-101-19.int kernel: Killed process 40340 (python) total-vm:21606280kB, anon-rss:15724088kB, file-rss:12808kB, shmem-rss:0kB Mar 08 11:08:31 sh-101-19.int kernel: R invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 08 11:08:31 sh-101-19.int kernel: R cpuset=step_batch mems_allowed=0-1 Mar 08 11:08:31 sh-101-19.int kernel: CPU: 13 PID: 22596 Comm: R Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 08 11:08:31 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 08 11:08:31 sh-101-19.int kernel: Call Trace: Mar 08 11:08:31 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 08 11:08:31 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 08 11:08:31 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 08 11:08:31 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 08 11:08:31 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 08 11:08:31 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 08 11:08:31 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 08 11:08:31 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 08 11:08:31 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 08 11:08:31 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 08 11:08:31 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 08 11:08:31 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 08 11:08:31 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 08 11:08:32 sh-101-19.int kernel: Task in /slurm/uid_329096/job_38758664/step_batch/task_0 killed as a result of limit of /slurm/uid_329096/job_38758664/step_batch Mar 08 11:08:32 sh-101-19.int kernel: memory: usage 49152000kB, limit 49152000kB, failcnt 530484 Mar 08 11:08:32 sh-101-19.int kernel: memory+swap: usage 49152000kB, limit 49152000kB, failcnt 0 Mar 08 11:08:32 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 08 11:08:32 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_329096/job_38758664/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 08 11:08:32 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_329096/job_38758664/step_batch/task_0: cache:0KB rss:49152000KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:4397372KB active_anon:44754596KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 08 11:08:32 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 08 11:08:32 sh-101-19.int kernel: [22382] 329096 22382 28362 455 14 0 0 slurm_script Mar 08 11:08:32 sh-101-19.int kernel: [22586] 329096 22586 28336 426 13 0 0 sh Mar 08 11:08:32 sh-101-19.int kernel: [22596] 329096 22596 12483572 12289576 24076 0 0 R Mar 08 11:08:32 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 22596 (R) score 1002 or sacrifice child Mar 08 11:08:32 sh-101-19.int kernel: Killed process 22596 (R) total-vm:49934288kB, anon-rss:49150992kB, file-rss:7312kB, shmem-rss:0kB Mar 08 15:37:02 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552087621/real 1552087621] req@ffff9c04c1664e00 x1626164621713920/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 2 to 1 dl 1552088222 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 08 15:37:02 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 15:37:02 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 15:43:09 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552087936/real 1552087936] req@ffff9c04bd7aa100 x1626164621768640/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1552088589 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 08 15:43:09 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 15:43:09 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 15:47:03 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552088222/real 1552088222] req@ffff9c04c1664e00 x1626164621713920/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 2 to 1 dl 1552088823 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 08 15:47:03 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 15:47:03 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 15:52:03 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 15:52:03 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 15:52:03 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 15:52:03 sh-101-19.int kernel: Call Trace: Mar 08 15:52:03 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 15:52:03 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 15:52:03 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 15:52:03 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 15:52:03 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 15:52:03 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 15:52:03 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 15:52:03 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 15:52:03 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 15:52:03 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 15:52:03 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 15:52:03 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 15:52:03 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 15:52:03 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 15:52:03 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 15:54:02 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552088589/real 1552088589] req@ffff9c04bd7aa100 x1626164621768640/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1552089242 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 08 15:54:02 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 15:54:02 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 15:54:03 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 15:54:03 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 15:54:03 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 15:54:03 sh-101-19.int kernel: Call Trace: Mar 08 15:54:03 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 15:54:03 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 15:54:03 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 15:54:03 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 15:54:03 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 15:54:03 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 15:54:03 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 15:54:03 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 15:54:03 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 15:54:03 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 15:54:03 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 15:54:03 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 15:54:03 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 15:54:03 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 15:54:03 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 15:56:03 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 15:56:03 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 15:56:03 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 15:56:03 sh-101-19.int kernel: Call Trace: Mar 08 15:56:03 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 15:56:03 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 15:56:03 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 15:56:03 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 15:56:03 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 15:56:03 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 15:56:03 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 15:56:03 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 15:56:03 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 15:56:03 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 15:56:03 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 15:56:03 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 15:56:03 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 15:56:03 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 15:56:03 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 15:57:04 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552088823/real 1552088823] req@ffff9c04c1664e00 x1626164621713920/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 2 to 1 dl 1552089424 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 08 15:57:04 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 15:57:05 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 15:58:03 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 15:58:03 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 15:58:03 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 15:58:03 sh-101-19.int kernel: Call Trace: Mar 08 15:58:03 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 15:58:03 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 15:58:03 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 15:58:03 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 15:58:03 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 15:58:03 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 15:58:03 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 15:58:03 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 15:58:03 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 15:58:03 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 15:58:03 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 15:58:03 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 15:58:03 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 15:58:03 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 15:58:03 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 16:00:03 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 16:00:03 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 16:00:03 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 16:00:03 sh-101-19.int kernel: Call Trace: Mar 08 16:00:03 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 16:00:03 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 16:00:03 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 16:00:03 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 16:00:03 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 16:00:03 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 16:00:03 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 16:00:03 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 16:00:03 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 16:00:03 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 16:00:03 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 16:00:03 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 16:00:03 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 16:00:03 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 16:00:03 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 16:02:03 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 16:02:03 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 16:02:03 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 16:02:03 sh-101-19.int kernel: Call Trace: Mar 08 16:02:03 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 16:02:03 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 16:02:03 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 16:02:03 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 16:02:03 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 16:02:03 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 16:02:03 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 16:02:03 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 16:02:03 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 16:02:03 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 16:02:03 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 16:02:03 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 16:02:03 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 16:02:03 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 16:02:03 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 16:04:03 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 16:04:03 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 16:04:03 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 16:04:03 sh-101-19.int kernel: Call Trace: Mar 08 16:04:03 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 16:04:03 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 16:04:03 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 16:04:03 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 16:04:03 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 16:04:03 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 16:04:03 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 16:04:03 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 16:04:03 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 16:04:03 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 16:04:03 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 16:04:03 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 16:04:03 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 16:04:03 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 16:04:03 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 16:04:55 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552089242/real 1552089242] req@ffff9c04bd7aa100 x1626164621768640/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1552089895 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 08 16:04:55 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 16:04:55 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 16:06:03 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 16:06:03 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 16:06:03 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 16:06:03 sh-101-19.int kernel: Call Trace: Mar 08 16:06:03 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 16:06:03 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 16:06:03 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 16:06:03 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 16:06:04 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 16:06:04 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 16:06:04 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 16:06:04 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 16:06:04 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 16:06:04 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 16:06:04 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 16:06:04 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 16:06:04 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 16:06:04 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 16:06:04 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 16:07:06 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552089425/real 1552089425] req@ffff9c04c1664e00 x1626164621713920/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 2 to 1 dl 1552090026 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 08 16:07:06 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 16:07:06 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 16:08:04 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 16:08:04 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 16:08:04 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 16:08:04 sh-101-19.int kernel: Call Trace: Mar 08 16:08:04 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 16:08:04 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 16:08:04 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 16:08:04 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 16:08:04 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 16:08:04 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 16:08:04 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 16:08:04 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 16:08:04 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 16:08:04 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 16:08:04 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 16:08:04 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 16:08:04 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 16:08:04 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 16:08:04 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 16:10:04 sh-101-19.int kernel: INFO: task migratefs:173653 blocked for more than 120 seconds. Mar 08 16:10:04 sh-101-19.int kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 08 16:10:04 sh-101-19.int kernel: migratefs D ffff9c05f880c100 0 173653 1 0x00000080 Mar 08 16:10:04 sh-101-19.int kernel: Call Trace: Mar 08 16:10:04 sh-101-19.int kernel: [] ? ll_get_acl+0x31/0xf0 [lustre] Mar 08 16:10:04 sh-101-19.int kernel: [] ? ll_dcompare+0x72/0x2e0 [lustre] Mar 08 16:10:04 sh-101-19.int kernel: [] schedule_preempt_disabled+0x29/0x70 Mar 08 16:10:04 sh-101-19.int kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Mar 08 16:10:04 sh-101-19.int kernel: [] mutex_lock+0x1f/0x2f Mar 08 16:10:04 sh-101-19.int kernel: [] lookup_slow+0x33/0xa7 Mar 08 16:10:04 sh-101-19.int kernel: [] link_path_walk+0x80f/0x8b0 Mar 08 16:10:04 sh-101-19.int kernel: [] ? do_last+0x66d/0x12a0 Mar 08 16:10:04 sh-101-19.int kernel: [] path_openat+0xb5/0x640 Mar 08 16:10:04 sh-101-19.int kernel: [] ? user_path_at_empty+0x72/0xc0 Mar 08 16:10:04 sh-101-19.int kernel: [] do_filp_open+0x4d/0xb0 Mar 08 16:10:04 sh-101-19.int kernel: [] ? __alloc_fd+0x47/0x170 Mar 08 16:10:04 sh-101-19.int kernel: [] do_sys_open+0x137/0x240 Mar 08 16:10:04 sh-101-19.int kernel: [] SyS_openat+0x14/0x20 Mar 08 16:10:04 sh-101-19.int kernel: [] system_call_fastpath+0x22/0x27 Mar 08 16:17:07 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552090026/real 1552090026] req@ffff9c04c1664e00 x1626164621713920/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 2 to 1 dl 1552090627 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 08 16:17:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 16:17:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 16:27:08 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552090627/real 1552090627] req@ffff9c04c1664e00 x1626164621713920/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 2 to 1 dl 1552091228 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 16:27:08 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 16:27:08 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 16:27:08 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 08 16:27:08 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 16:27:08 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 08 16:33:22 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 16:33:22 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 08 16:33:23 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0003-mdc-ffff9c05b8776000: operation mds_reint to node 10.0.10.52@o2ib7 failed: rc = -19 Mar 08 16:34:44 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9c01e6558f00 x1626164621713792/t63693413164(63693413164) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552092440 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 08 16:34:44 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 11 previous similar messages Mar 08 16:35:27 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 08 16:35:27 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 08 18:54:41 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 08 18:56:37 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 08 19:07:45 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 08 19:07:46 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 08 21:38:07 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552109131/real 1552109131] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552109887 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 08 21:38:07 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 21:38:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 21:38:07 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 08 21:38:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 21:38:07 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 08 21:50:43 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552109887/real 1552109887] req@ffff9c02bc7fd700 x1626164625337696/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 632/2088 e 0 to 1 dl 1552110643 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 21:50:43 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552109887/real 1552109887] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552110643 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 21:50:43 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 21:50:43 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 21:50:43 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 21:51:43 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 08 22:02:13 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 08 22:02:18 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 08 22:03:19 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552110643/real 1552110643] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552111399 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 22:03:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 22:03:19 sh-101-19.int kernel: LustreError: 90626:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 08 22:03:19 sh-101-19.int kernel: LustreError: 90626:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 08 22:03:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 22:03:19 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 22:07:42 sh-101-19.int kernel: Lustre: 45305:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552110905/real 1552110905] req@ffff9bf864535400 x1626164625485280/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1792/10616 e 0 to 1 dl 1552111662 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1 Mar 08 22:07:42 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 22:07:42 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 22:15:55 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552111399/real 1552111399] req@ffff9bede6c78000 x1626164625485456/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/10616 e 0 to 1 dl 1552112155 ref 2 fl Rpc:XP/2/ffffffff rc -11/-1 Mar 08 22:15:55 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 22:15:55 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 22:15:55 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 22:28:31 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552112155/real 1552112155] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552112911 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 22:28:31 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 22:28:31 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 22:38:38 sh-101-19.int kernel: Lustre: 45306:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552112762/real 1552112762] req@ffff9bffc6f8f800 x1626164625706208/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552113518 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 08 22:38:38 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 22:38:38 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 22:41:07 sh-101-19.int kernel: Lustre: 173652:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552112911/real 1552112911] req@ffff9bf667432a00 x1626164625664528/t0(0) o49->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 464/1704 e 0 to 1 dl 1552113667 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 22:41:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 22:41:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 22:51:24 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552113528/real 1552113528] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552114284 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 08 22:51:24 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 22:51:24 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 22:51:24 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 08 22:53:43 sh-101-19.int kernel: Lustre: 95394:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552113667/real 1552113667] req@ffff9beb6e277500 x1626164625807248/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552114423 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 08 22:53:43 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 22:53:43 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 22:53:43 sh-101-19.int kernel: Lustre: 95394:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 23:04:00 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552114284/real 1552114284] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552115040 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 08 23:04:00 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:04:00 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 08 23:06:19 sh-101-19.int kernel: Lustre: 173652:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552114423/real 1552114423] req@ffff9bf667432a00 x1626164625664528/t0(0) o49->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 464/1704 e 0 to 1 dl 1552115179 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 23:06:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:06:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 23:06:19 sh-101-19.int kernel: Lustre: 173652:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 23:16:36 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552115040/real 1552115040] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552115796 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 23:16:36 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:16:36 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 08 23:18:55 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552115179/real 1552115179] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552115935 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 23:18:55 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:18:55 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 23:18:55 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 23:29:12 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552115796/real 1552115796] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552116552 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 23:29:12 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:29:12 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 08 23:31:31 sh-101-19.int kernel: Lustre: 173652:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552115935/real 1552115935] req@ffff9bf667432a00 x1626164625664528/t0(0) o49->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 464/1704 e 0 to 1 dl 1552116691 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 23:31:31 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:31:31 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 23:31:31 sh-101-19.int kernel: Lustre: 173652:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 08 23:41:48 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552116552/real 1552116552] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552117308 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 23:41:48 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:41:48 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 08 23:44:07 sh-101-19.int kernel: Lustre: 191779:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552116691/real 1552116691] req@ffff9bf864762d00 x1626164625918976/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552117447 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 23:44:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:44:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 23:44:07 sh-101-19.int kernel: Lustre: 191779:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 08 23:54:24 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552117308/real 1552117308] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552118064 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 23:54:24 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:54:24 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 08 23:56:43 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552117447/real 1552117447] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552118203 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 08 23:56:43 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 08 23:56:43 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 08 23:56:43 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 00:07:00 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552118064/real 1552118064] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552118820 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 00:07:00 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 00:07:00 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:07:00 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 00:07:00 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 00:07:00 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 00:09:19 sh-101-19.int kernel: Lustre: 101138:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552118203/real 1552118203] req@ffff9bfd91ad6300 x1626164626227120/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552118959 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 00:09:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:09:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 00:09:19 sh-101-19.int kernel: Lustre: 101138:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 00:19:36 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552118820/real 1552118820] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552119576 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 00:19:36 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:19:36 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 00:21:55 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552118959/real 1552118959] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552119715 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 00:21:55 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:21:55 sh-101-19.int kernel: LustreError: 103689:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 00:21:55 sh-101-19.int kernel: LustreError: 103689:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 00:21:55 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 00:21:55 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 00:32:12 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552119576/real 1552119576] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552120332 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 00:32:12 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:32:12 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 00:34:31 sh-101-19.int kernel: Lustre: 101138:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552119715/real 1552119715] req@ffff9bfd91ad6300 x1626164626227120/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552120471 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 00:34:31 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:34:31 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 00:34:31 sh-101-19.int kernel: Lustre: 101138:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 00:44:48 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552120332/real 1552120332] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552121088 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 00:44:48 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:44:48 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 00:47:07 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552120471/real 1552120471] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552121227 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 00:47:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:47:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 00:47:07 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 00:57:24 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552121088/real 1552121088] req@ffff9bfa17bb0f00 x1626164626666768/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552121844 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 00:57:24 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:57:24 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 00:57:24 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 00:59:44 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552121227/real 1552121227] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552121983 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 00:59:44 sh-101-19.int kernel: Lustre: 101138:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552121227/real 1552121227] req@ffff9bfd91ad6300 x1626164626227120/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552121983 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 00:59:44 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 00:59:44 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 00:59:44 sh-101-19.int kernel: LustreError: 101138:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -11 Mar 09 00:59:44 sh-101-19.int kernel: LustreError: 101138:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -11 Mar 09 00:59:44 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 01:10:00 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552121844/real 1552121844] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552122600 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 01:10:00 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 01:10:00 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 01:12:19 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552121983/real 1552121983] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552122739 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 01:12:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 01:12:19 sh-101-19.int kernel: LustreError: 108457:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 01:12:19 sh-101-19.int kernel: LustreError: 108457:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 01:12:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 01:12:19 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 01:22:36 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552122600/real 1552122600] req@ffff9bfa17bb0f00 x1626164626666768/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552123356 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 01:22:36 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 01:22:36 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 01:22:36 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 01:24:55 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552122739/real 1552122739] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552123495 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 01:24:55 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 01:24:55 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 01:24:55 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 01:35:12 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552123356/real 1552123356] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552124112 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 01:35:12 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 01:35:12 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 01:37:31 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552123495/real 1552123495] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552124251 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 01:37:31 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 01:37:31 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 01:37:31 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 01:47:49 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552124112/real 1552124112] req@ffff9bfa17bb0f00 x1626164626666768/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552124868 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 01:47:49 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 01:47:49 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 01:50:07 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552124251/real 1552124251] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552125007 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 01:50:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 01:50:07 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 01:50:07 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 02:00:25 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552124869/real 1552124869] req@ffff9bfa17bb0f00 x1626164626666768/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552125625 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 02:00:25 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:00:25 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 02:00:25 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 02:00:40 sh-101-19.int kernel: Lustre: 113844:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552125039/real 1552125039] req@ffff9c02bc596c00 x1626164627140192/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552125640 ref 2 fl Rpc:IX/0/ffffffff rc 0/-1 Mar 09 02:00:40 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:00:40 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 02:10:41 sh-101-19.int kernel: Lustre: 113844:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552125640/real 1552125640] req@ffff9c02bc596c00 x1626164627140192/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552126241 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 02:10:41 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:10:41 sh-101-19.int kernel: LustreError: 113844:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 02:10:41 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 02:10:41 sh-101-19.int kernel: LustreError: 113844:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 02:13:01 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552125625/real 1552125625] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552126381 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 02:13:01 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:13:01 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 02:23:17 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552126241/real 1552126241] req@ffff9bfa49e4da00 x1626164627192592/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552126997 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 02:23:17 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:23:17 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 02:23:17 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 02:25:37 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552126381/real 1552126381] req@ffff9bfa17bb0f00 x1626164626666768/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552127137 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 02:25:37 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:25:37 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 02:25:37 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 02:35:53 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552126997/real 1552126997] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552127753 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 02:35:53 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:35:53 sh-101-19.int kernel: LustreError: 116110:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 02:35:53 sh-101-19.int kernel: LustreError: 116110:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 02:35:53 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 02:35:53 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 02:38:13 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552127137/real 1552127137] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552127893 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 02:38:13 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:38:13 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 02:48:29 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552127753/real 1552127753] req@ffff9bfa49e4da00 x1626164627192592/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552128509 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 02:48:29 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:48:29 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 02:48:29 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 02:50:49 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552127893/real 1552127893] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552128649 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 02:50:49 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 02:50:49 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 02:50:49 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 03:01:05 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552128509/real 1552128509] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552129265 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 03:01:05 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:01:05 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 03:01:05 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 03:03:25 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552128649/real 1552128649] req@ffff9bfa17bb0f00 x1626164626666768/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552129405 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 03:03:25 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:03:25 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 03:13:41 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552129265/real 1552129265] req@ffff9bfa49e4da00 x1626164627192592/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552130021 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 03:13:41 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:13:41 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 03:13:41 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 03:16:01 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552129405/real 1552129405] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552130161 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 03:16:01 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:16:01 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 03:16:01 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 03:26:17 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552130021/real 1552130021] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552130777 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 03:26:17 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:26:17 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 03:26:17 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 03:28:37 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552130161/real 1552130161] req@ffff9bfa17bb0f00 x1626164626666768/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552130917 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 03:28:37 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:28:37 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 03:38:53 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552130777/real 1552130777] req@ffff9bfa49e4da00 x1626164627192592/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552131533 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 03:38:53 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:38:53 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 03:38:53 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 03:41:13 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552130917/real 1552130917] req@ffff9bfa17bb0f00 x1626164626666768/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552131673 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 03:41:13 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:41:13 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 03:41:13 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 03:48:54 sh-101-19.int kernel: Lustre: 123866:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552131533/real 1552131533] req@ffff9bfa17bb5700 x1626164627871088/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552132134 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 03:48:54 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:48:54 sh-101-19.int kernel: LustreError: 123866:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -11 Mar 09 03:48:54 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 03:48:54 sh-101-19.int kernel: LustreError: 123866:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -11 Mar 09 03:53:49 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552131673/real 1552131673] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552132429 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 03:53:49 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 03:53:49 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 04:01:30 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552132134/real 1552132134] req@ffff9bfa49e4da00 x1626164627192592/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552132890 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 04:01:30 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552132134/real 1552132134] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552132890 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 04:01:30 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552132134/real 1552132134] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552132890 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 04:01:30 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 04:01:30 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 04:01:30 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 04:06:25 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552132429/real 1552132429] req@ffff9bfa17bb0f00 x1626164626666768/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552133185 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 04:06:25 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 04:06:25 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 04:06:25 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 04:14:06 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552132890/real 1552132890] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552133646 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 04:14:06 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552132890/real 1552132890] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552133646 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 04:14:06 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 04:14:06 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 04:14:06 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 04:19:01 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552133185/real 1552133185] req@ffff9bf8bb709b00 x1626164625817280/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 784/1752 e 0 to 1 dl 1552133941 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 04:19:01 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 04:19:01 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 04:26:42 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552133646/real 1552133646] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552134402 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 04:26:42 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 04:26:42 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 04:26:42 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 04:31:37 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 04:31:37 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 09 04:39:18 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552134402/real 1552134402] req@ffff9bfa49e4da00 x1626164627192592/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552135158 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 04:39:18 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 04:39:18 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 04:39:18 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 09 04:51:54 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552135158/real 1552135158] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552135914 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 04:51:54 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 04:51:54 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 04:51:54 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 04:51:54 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 04:51:54 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 09 05:04:30 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552135914/real 1552135914] req@ffff9bfa49e4da00 x1626164627192592/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552136670 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 05:04:30 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552135914/real 1552135914] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552136670 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 05:04:30 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 05:04:30 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 05:04:30 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:04:30 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 05:04:30 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:04:30 sh-101-19.int kernel: Lustre: 114600:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 05:17:06 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552136670/real 1552136670] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552137426 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 05:17:06 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552136670/real 1552136670] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552137426 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 05:17:06 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 05:17:06 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 05:17:06 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:17:06 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 05:17:06 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:17:06 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 05:29:42 sh-101-19.int kernel: Lustre: 133327:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552137426/real 1552137426] req@ffff9c020236b600 x1626164628545904/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552138182 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 05:29:42 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 05:29:42 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:29:42 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 05:29:42 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:29:42 sh-101-19.int kernel: Lustre: 133327:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Mar 09 05:39:43 sh-101-19.int kernel: Lustre: 134066:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552138182/real 1552138182] req@ffff9bfd9429d700 x1626164628595552/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552138783 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 05:39:43 sh-101-19.int kernel: Lustre: 134066:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 05:39:43 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 05:39:43 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:39:43 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 05:39:43 sh-101-19.int kernel: LustreError: 134066:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -11 Mar 09 05:39:43 sh-101-19.int kernel: LustreError: 134066:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -11 Mar 09 05:39:43 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:52:19 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552138783/real 1552138783] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552139539 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 05:52:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 05:52:19 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:52:19 sh-101-19.int kernel: LustreError: 133327:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 05:52:19 sh-101-19.int kernel: LustreError: 133327:0:(lmv_obd.c:1412:lmv_statfs()) Skipped 1 previous similar message Mar 09 05:52:19 sh-101-19.int kernel: LustreError: 114600:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 05:52:19 sh-101-19.int kernel: LustreError: 114600:0:(llite_lib.c:1807:ll_statfs_internal()) Skipped 1 previous similar message Mar 09 05:52:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 05:52:19 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 05:52:19 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Mar 09 06:03:48 sh-101-19.int kernel: Lustre: 137159:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552139627/real 1552139627] req@ffff9bfa96236600 x1626164628827216/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552140228 ref 2 fl Rpc:IX/0/ffffffff rc 0/-1 Mar 09 06:03:48 sh-101-19.int kernel: Lustre: 137159:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 06:03:48 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 06:03:48 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:03:48 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 06:03:48 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:13:49 sh-101-19.int kernel: Lustre: 137159:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552140228/real 1552140228] req@ffff9bfa96236600 x1626164628827216/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552140829 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 06:13:49 sh-101-19.int kernel: Lustre: 137159:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 06:13:49 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 06:13:49 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:13:49 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 06:13:49 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:23:50 sh-101-19.int kernel: Lustre: 137159:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552140829/real 1552140829] req@ffff9bfa96236600 x1626164628827216/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552141430 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 06:23:50 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 06:23:50 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 06:33:51 sh-101-19.int kernel: Lustre: 137159:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552141430/real 1552141430] req@ffff9bfa96236600 x1626164628827216/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552142031 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 06:33:51 sh-101-19.int kernel: Lustre: 137159:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 06:33:51 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 06:33:51 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:33:51 sh-101-19.int kernel: LustreError: 137159:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 06:33:51 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 06:33:51 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:33:51 sh-101-19.int kernel: LustreError: 137159:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 06:43:58 sh-101-19.int kernel: Lustre: 138642:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552142031/real 1552142031] req@ffff9bf6e566ce00 x1626164628934704/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552142638 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 06:43:58 sh-101-19.int kernel: Lustre: 138642:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 06:43:58 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 06:43:58 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:43:58 sh-101-19.int kernel: LustreError: 138642:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 06:43:58 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 06:43:58 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:43:58 sh-101-19.int kernel: LustreError: 138642:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 06:56:34 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552142638/real 1552142638] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552143394 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 06:56:34 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 06:56:34 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:56:34 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 06:56:34 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 06:56:34 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 09 07:09:10 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552143394/real 1552143394] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552144150 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 07:09:10 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 07:09:10 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:09:10 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 07:09:10 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:09:10 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 07:21:46 sh-101-19.int kernel: Lustre: 144057:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552144150/real 1552144150] req@ffff9c01e3d61500 x1626164629345600/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 0 to 1 dl 1552144906 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 07:21:46 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 07:21:46 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:21:46 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 07:21:46 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:21:46 sh-101-19.int kernel: Lustre: 144057:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 09 07:34:22 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552144906/real 1552144906] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552145662 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 07:34:22 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 07:34:22 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:34:22 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 07:34:22 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:34:22 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 09 07:46:58 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552145662/real 1552145662] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552146418 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 07:46:58 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 07:46:58 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:46:58 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 07:46:58 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:46:58 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 09 07:59:34 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552146418/real 1552146418] req@ffff9bf8ad593f00 x1626164625282624/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552147174 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 07:59:34 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 07:59:34 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:59:34 sh-101-19.int kernel: LustreError: 144057:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 07:59:34 sh-101-19.int kernel: LustreError: 144057:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 07:59:34 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 07:59:34 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 07:59:34 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 08:12:10 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552147174/real 1552147174] req@ffff9c058764b900 x1626164626351968/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552147930 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 08:12:10 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 08:12:10 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 08:12:10 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 08:12:10 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 08:12:10 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 08:22:11 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552147930/real 1552147930] req@ffff9bfcaf2b2100 x1626164629756656/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552148531 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 08:22:11 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 08:22:11 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 08:22:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 08:22:11 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 08:22:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 08:32:12 sh-101-19.int kernel: Lustre: 151112:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552148531/real 1552148531] req@ffff9bfcaf2b3600 x1626164629858752/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552149132 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 08:32:12 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 08:32:12 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 08:32:12 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 08:32:12 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 08:32:12 sh-101-19.int kernel: Lustre: 151112:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 08:42:13 sh-101-19.int kernel: Lustre: 151858:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552149132/real 1552149132] req@ffff9bfc11e3c200 x1626164629911200/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552149733 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 08:42:13 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 08:42:13 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 08:42:13 sh-101-19.int kernel: Lustre: 151858:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 08:52:14 sh-101-19.int kernel: Lustre: 151112:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552149733/real 1552149733] req@ffff9bfcaf2b3600 x1626164629858752/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552150334 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 08:52:14 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 08:52:14 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 08:52:14 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 08:52:14 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 08:52:14 sh-101-19.int kernel: Lustre: 151112:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 09:02:15 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552150334/real 1552150334] req@ffff9bfcaf2b2100 x1626164629756656/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552150935 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 09:02:15 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 09:02:15 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:02:15 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 09:02:15 sh-101-19.int kernel: LustreError: 151858:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -11 Mar 09 09:02:15 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:02:15 sh-101-19.int kernel: LustreError: 151858:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -11 Mar 09 09:02:15 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 09:12:16 sh-101-19.int kernel: Lustre: 154834:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552150935/real 1552150935] req@ffff9bfce4372d00 x1626164630131328/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552151536 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 09:12:16 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 09:12:16 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:12:16 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 09:12:16 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:12:16 sh-101-19.int kernel: Lustre: 154834:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 09:22:17 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552151536/real 1552151536] req@ffff9bfcaf2b2100 x1626164629756656/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552152137 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 09:22:17 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 09:22:17 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:22:17 sh-101-19.int kernel: LustreError: 151112:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 09:22:17 sh-101-19.int kernel: LustreError: 151112:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 09:22:17 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 09:22:17 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:22:17 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 09 09:32:18 sh-101-19.int kernel: Lustre: 154834:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552152137/real 1552152137] req@ffff9bfce4372d00 x1626164630131328/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552152738 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 09:32:18 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 09:32:18 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 09:42:19 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552152738/real 1552152738] req@ffff9bfcaf2b2100 x1626164629756656/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552153339 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 09:42:19 sh-101-19.int kernel: Lustre: 157906:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552152738/real 1552152738] req@ffff9bfd9429d700 x1626164630356864/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552153339 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 09:42:19 sh-101-19.int kernel: Lustre: 157906:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 09:42:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 09:42:19 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:42:19 sh-101-19.int kernel: LustreError: 157906:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 09:42:19 sh-101-19.int kernel: LustreError: 154834:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 09:42:19 sh-101-19.int kernel: LustreError: 157906:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 09:42:19 sh-101-19.int kernel: LustreError: 157906:0:(llite_lib.c:1807:ll_statfs_internal()) Skipped 1 previous similar message Mar 09 09:42:19 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 09:42:19 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:42:19 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 09:52:20 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552153339/real 1552153339] req@ffff9bfcaf2b2100 x1626164629756656/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552153940 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 09:52:20 sh-101-19.int kernel: Lustre: 149667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 09:52:20 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 09:52:20 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:52:20 sh-101-19.int kernel: LustreError: 149667:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 09:52:20 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 09:52:20 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 09:52:20 sh-101-19.int kernel: LustreError: 149667:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 10:04:56 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552153940/real 1552153940] req@ffff9c007d216000 x1626164626261392/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552154696 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 10:04:56 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 10:04:56 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 10:04:56 sh-101-19.int kernel: LustreError: 158651:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 10:04:56 sh-101-19.int kernel: LustreError: 158651:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 10:04:56 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 10:04:56 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 10:04:56 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Mar 09 10:10:04 sh-101-19.int kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Mar 09 10:10:04 sh-101-19.int kernel: LustreError: Skipped 2 previous similar messages Mar 09 10:14:57 sh-101-19.int kernel: Lustre: 160326:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552154696/real 1552154696] req@ffff9c04b8300300 x1626164630538144/t0(0) o41->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 440/368 e 1 to 1 dl 1552155297 ref 2 fl Rpc:IX/2/ffffffff rc -11/-1 Mar 09 10:14:57 sh-101-19.int kernel: Lustre: 160326:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Mar 09 10:14:57 sh-101-19.int kernel: LustreError: 160326:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 10:14:57 sh-101-19.int kernel: LustreError: 160326:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 10:17:11 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9bf8bb70bf00 x1626164625817264/t64433521016(64433521016) o101->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552155438 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 09 10:17:11 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 3 previous similar messages Mar 09 10:17:32 sh-101-19.int kernel: LustreError: 150398:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error 301 Mar 09 10:17:32 sh-101-19.int kernel: LustreError: 150398:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = 301 Mar 09 10:18:14 sh-101-19.int kernel: LustreError: 162111:0:(file.c:4393:ll_inode_revalidate_fini()) fir: revalidate FID [0x200000007:0x1:0x0] error: rc = -4 Mar 09 10:18:27 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 12:18:15 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552162093/real 1552162093] req@ffff9c03f5f58900 x1626164631660832/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 4 to 1 dl 1552162695 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 09 12:18:15 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 09 12:18:15 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 12:18:15 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 09 12:18:15 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 12:18:15 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 09 12:28:08 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552162695/real 1552162695] req@ffff9c03f5f58600 x1626164631704752/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/10616 e 0 to 1 dl 1552163288 ref 2 fl Rpc:XP/2/ffffffff rc -11/-1 Mar 09 12:28:08 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 12:28:08 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 12:34:13 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552162897/real 1552162897] req@ffff9c04c47c1b00 x1626164631761408/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 0 to 1 dl 1552163653 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 09 12:34:13 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 12:34:13 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 12:46:49 sh-101-19.int kernel: Lustre: 173650:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552163653/real 1552163653] req@ffff9bea7e62ce00 x1626164631841536/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 640/2088 e 0 to 1 dl 1552164409 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 09 12:46:49 sh-101-19.int kernel: Lustre: 173650:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 12:46:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 12:46:49 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 12:46:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 12:46:49 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 12:57:47 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552164474/real 1552164474] req@ffff9c03f5f58600 x1626164631704752/t0(0) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/10616 e 0 to 1 dl 1552165067 ref 2 fl Rpc:XP/2/ffffffff rc -11/-1 Mar 09 12:57:47 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 12:57:47 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 12:57:47 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 12:57:48 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 12:57:48 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:09:27 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552165166/real 1552165166] req@ffff9bf4ae0aa700 x1626164632019664/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/10616 e 1 to 1 dl 1552165767 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1 Mar 09 13:09:27 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 13:09:27 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 13:09:28 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 09 13:09:28 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 13:09:28 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 09 13:19:29 sh-101-19.int kernel: Lustre: 173650:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552165768/real 1552165768] req@ffff9bff7291a700 x1626164632019792/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552166369 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 13:19:29 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 13:19:29 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:19:29 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 13:19:29 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:19:29 sh-101-19.int kernel: Lustre: 173650:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 13:29:30 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552166369/real 1552166369] req@ffff9bfa96232100 x1626164632089664/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552166970 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 13:29:30 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 13:29:30 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:29:30 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 13:29:30 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:29:30 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 09 13:39:31 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552166970/real 1552166970] req@ffff9bfa96230c00 x1626164632226032/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552167571 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 09 13:39:31 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 13:39:31 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 13:39:31 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:39:31 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 13:39:31 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:45:46 sh-101-19.int kernel: LustreError: 180806:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 09 13:45:46 sh-101-19.int kernel: LustreError: 180806:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 09 13:49:32 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552167571/real 1552167571] req@ffff9bfa96230c00 x1626164632226032/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552168172 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 13:49:32 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 13:49:32 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 13:49:32 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:49:32 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 13:49:32 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:59:33 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552168172/real 1552168172] req@ffff9bfa96230c00 x1626164632226032/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552168773 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 13:59:33 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 13:59:33 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 13:59:33 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 13:59:33 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 13:59:33 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 14:12:09 sh-101-19.int kernel: Lustre: 111494:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552168773/real 1552168773] req@ffff9c058dd61e00 x1626164632362864/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 608/2088 e 0 to 1 dl 1552169529 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 09 14:12:09 sh-101-19.int kernel: Lustre: 111494:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 14:12:09 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 14:12:09 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 14:12:09 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 14:12:09 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 14:22:11 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552169529/real 1552169529] req@ffff9bf88fcb3000 x1626164632522208/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 568/3376 e 1 to 1 dl 1552170131 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 09 14:22:11 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 09 14:22:11 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 09 14:22:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 14:22:11 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 14:22:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 14:30:36 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0003-mdc-ffff9c05b8776000: operation mds_reint to node 10.0.10.52@o2ib7 failed: rc = -19 Mar 09 14:30:36 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Mar 09 14:33:13 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0002_UUID went back in time (transno 73023254053 was previously committed, server now claims 60407812735)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 09 14:33:13 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9c058dd64800 x1626164632290544/t64483390427(64483390427) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1176/560 e 0 to 0 dl 1552171549 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 09 14:33:13 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 5 previous similar messages Mar 09 14:34:03 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9c03f5f5e000 x1626164631660704/t68840102069(68840102069) o101->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552171599 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 09 14:34:28 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 09 14:34:28 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 09 16:03:10 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 09 23:51:28 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 10 03:27:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:27:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:27:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:27:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:27:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:28:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:28:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:28:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:28:02 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:28:02 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:28:05 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:28:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:28:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:28:59 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:30:37 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:30:38 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:31:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:31:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:31:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:31:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:32:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:33:46 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:34:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:34:18 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:34:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:34:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:35:22 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:36:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:37:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:37:23 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:37:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:37:40 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:37:59 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:38:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:40:05 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:40:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:40:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:40:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:40:45 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:41:38 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:43:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:43:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:43:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:43:34 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:43:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:44:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:46:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:46:22 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:46:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:46:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:46:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:47:58 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:49:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:49:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:49:33 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:49:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:50:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:51:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:52:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:52:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:52:40 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:52:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:53:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:54:16 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:55:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:55:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:55:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:55:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:56:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:57:24 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:58:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:58:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:58:59 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 03:59:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 03:59:16 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:00:36 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:01:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:01:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:02:04 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:02:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:02:21 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:03:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:04:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:04:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:05:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:05:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:05:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:06:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:07:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:07:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:08:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:08:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:08:32 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:10:02 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:10:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:10:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:11:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:11:36 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:11:37 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:13:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:13:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:13:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:14:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:14:42 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:14:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:16:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:16:24 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:16:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:17:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:17:47 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:17:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:19:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:19:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:19:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:20:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:20:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:21:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:22:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:22:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:22:42 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:23:40 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:23:57 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:24:14 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:25:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:25:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:25:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:26:45 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:27:02 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:27:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:28:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:28:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:29:02 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:29:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:30:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:30:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:31:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:31:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:32:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:32:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:33:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:33:43 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:34:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:34:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:35:18 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:36:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:36:18 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:36:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:37:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:37:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:38:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:39:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:39:23 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:40:03 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:40:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:40:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:41:38 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:42:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:42:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:43:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:43:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:43:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:44:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:45:16 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:45:33 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:46:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:46:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:46:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:47:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:48:21 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:48:38 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:49:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:49:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:49:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:51:07 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:51:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:51:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:52:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:52:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:52:41 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:54:16 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:54:31 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:54:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:55:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:55:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:55:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:57:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 04:57:37 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:57:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:58:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:58:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 04:58:58 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:00:33 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:00:42 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:01:00 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:01:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:01:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:02:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:03:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:03:47 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:04:05 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:04:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:04:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:05:18 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:06:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:06:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:07:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:07:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:07:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:08:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:09:57 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:10:00 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:10:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:10:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:10:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:11:36 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:13:02 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:13:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:13:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:13:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:13:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:14:47 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:16:07 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:16:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:16:21 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:16:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:16:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:17:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:19:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:19:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:19:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:19:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:19:32 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:21:04 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:22:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:22:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:22:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:22:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:22:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:24:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:25:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:25:22 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:25:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:25:40 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:25:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:27:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:28:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:28:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:28:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:28:46 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:28:57 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:30:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:31:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:31:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:31:32 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:31:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:32:07 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:33:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:34:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:34:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:34:38 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:34:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:35:18 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:36:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:37:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:37:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:37:43 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:38:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:38:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:40:03 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:40:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:40:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:40:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:41:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:41:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:43:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:43:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:43:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:43:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:44:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:44:47 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:46:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:46:21 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:46:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:46:58 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:47:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:47:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:49:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:49:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:49:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:50:03 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:50:22 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:51:04 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:52:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:52:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:52:38 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:53:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:53:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:54:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:55:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:55:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:55:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:56:14 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:56:32 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:57:24 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:58:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:58:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:58:59 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 05:59:19 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 05:59:37 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:00:32 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 06:01:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:01:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:02:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 06:02:24 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:02:42 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:03:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 06:04:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:04:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:05:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 06:05:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:05:47 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:06:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 06:07:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:07:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:08:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 06:08:34 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:08:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:09:59 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 06:10:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:10:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:11:04 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:04 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:04 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:04 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:04 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:04 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:04 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:04 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:04 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:12 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:12 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:13 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:14 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:14 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:14 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:14 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:14 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:14 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:16 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:16 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:16 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:16 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:16 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:16 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:17 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:17 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:17 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:17 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:19 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:19 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:19 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:19 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:20 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:26 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:26 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:26 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:26 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:26 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:26 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:29 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:29 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:29 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:29 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:34 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 06:11:35 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:36 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:38 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:38 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:39 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:39 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 06:11:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:41 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:41 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:44 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:44 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:45 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:45 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:52 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:55 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:57 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:57 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:57 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:11:57 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 06:12:03 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 10 06:12:06 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 10 07:48:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 07:48:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:48:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:50:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:51:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:51:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 07:51:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 07:52:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:53:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:54:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:54:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 07:54:58 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 07:55:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:56:45 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:57:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 07:58:03 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 07:58:18 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 07:59:59 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:00:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:01:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:01:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:03:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:03:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:04:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:05:00 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:06:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:06:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:07:18 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:08:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:09:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:09:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:10:23 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:11:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:12:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:13:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:13:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:14:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:15:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:16:32 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:16:34 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:18:16 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:18:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:19:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:19:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:21:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:21:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:22:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:23:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:24:40 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:24:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:25:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:26:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:27:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:27:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:28:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:29:24 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:30:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:31:00 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:31:59 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:32:33 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:33:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:34:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:35:04 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:35:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:36:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:37:18 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:38:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:38:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:39:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:40:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:41:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:42:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:42:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:43:38 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:44:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:45:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:45:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:46:45 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:47:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:48:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:48:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:49:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:50:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:51:31 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:51:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:53:03 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:53:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:54:40 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:54:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:56:16 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:56:41 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:57:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:57:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 08:59:22 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 08:59:46 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:00:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:00:58 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:02:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:02:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:03:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:04:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:05:40 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:05:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:06:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:07:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:08:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:09:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:09:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:10:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:12:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:12:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:12:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:13:37 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:15:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:15:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:15:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:16:43 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:18:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:18:19 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:18:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:19:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:21:22 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:21:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:21:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:23:02 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:24:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:24:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:24:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:26:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:27:32 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:27:45 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:27:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:29:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:30:37 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:30:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:30:57 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:32:31 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:33:43 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:33:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:34:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:35:40 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:36:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:36:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:37:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:38:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:39:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:39:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:40:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:41:57 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:42:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:42:58 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:43:31 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:45:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:45:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:46:03 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:46:43 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:48:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:48:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:49:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:49:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:51:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:51:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:52:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:52:59 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:54:33 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:54:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:55:18 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:56:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:57:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 09:57:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:58:23 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 09:59:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:00:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:00:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:01:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:02:26 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:03:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:04:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:04:34 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:05:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:06:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:07:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:07:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:08:45 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:09:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:10:19 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:10:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:11:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:12:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:13:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:13:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:15:03 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:15:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:16:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:16:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:18:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:18:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:19:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:19:59 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:21:22 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:21:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:22:58 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:23:04 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:24:31 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:24:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:26:07 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:26:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:27:41 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:27:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:29:14 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:29:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:30:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:30:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:32:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:32:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:33:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:34:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:35:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:35:34 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:36:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:37:09 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:38:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:38:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:39:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:40:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:41:36 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:41:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:42:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:43:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:44:41 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:45:02 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:45:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:46:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:47:46 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:48:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:48:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:49:47 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:50:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:51:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:51:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:52:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:53:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:54:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:54:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:56:03 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:57:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:57:37 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 10:57:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 10:59:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:00:06 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:00:47 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:00:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:02:22 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:03:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:03:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:03:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:05:31 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:06:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:06:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:07:04 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:08:42 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:09:22 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:09:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:10:14 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:11:49 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:12:27 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:12:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:13:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:15:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:15:33 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:15:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:16:36 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:18:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:18:38 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:18:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:19:43 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:21:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:21:43 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:21:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:22:51 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:24:25 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:24:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:24:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:26:01 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:27:34 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:27:53 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:27:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:29:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:30:43 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:30:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:30:58 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:32:17 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:33:52 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:33:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:34:03 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:35:28 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:36:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:37:04 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:37:08 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:38:35 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:39:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:40:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:40:13 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:41:46 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:42:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:43:19 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:43:21 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:44:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:45:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:46:24 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:46:30 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:48:05 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:48:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:49:29 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:49:36 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:51:11 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:51:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:52:34 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:52:45 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:54:23 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:54:54 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:55:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:55:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:57:31 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 11:57:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:58:44 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 11:59:04 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:00:38 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:00:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:01:50 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:02:12 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:03:48 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:03:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:04:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:05:21 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:06:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:06:58 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:08:00 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:08:32 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:09:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:10:07 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:11:05 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:11:39 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:12:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:13:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:14:10 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:14:47 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:15:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:16:24 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:17:15 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:17:56 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:18:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:19:33 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:20:20 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:21:07 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:21:55 sh-101-19.int kernel: nfs: server srcf.isilon not responding, timed out Mar 10 12:22:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:37 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:38 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:39 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:39 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:39 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:39 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:39 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:39 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:39 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:40 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:41 sh-101-19.int kernel: nfs: server srcf.isilon not responding, still trying Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:42 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:43 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:44 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:44 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:44 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:44 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:44 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:44 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:44 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:45 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:45 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:50 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:22:51 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:01 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:02 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:02 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:02 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:03 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:03 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:08 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:09 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:09 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:09 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:09 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:09 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:15 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:18 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:19 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:19 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:19 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:20 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:20 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:20 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:20 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:20 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:20 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:20 sh-101-19.int kernel: nfs: server srcf.isilon OK Mar 10 12:23:43 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 10 12:35:28 sh-101-19.int kernel: NFS: nfs4_reclaim_open_state: unhandled error -521 Mar 10 16:07:35 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552258653/real 1552258653] req@ffff9c01fce1ce00 x1626165090097696/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 1 to 1 dl 1552259255 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 10 16:07:35 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 10 16:07:35 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 10 16:07:35 sh-101-19.int kernel: Lustre: Skipped 5 previous similar messages Mar 10 16:07:35 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 10 16:07:35 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 10 16:17:36 sh-101-19.int kernel: Lustre: 93468:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552259255/real 1552259255] req@ffff9be8bf32a400 x1626165090173056/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1552259856 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 10 16:17:36 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 10 16:17:36 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 10 16:27:37 sh-101-19.int kernel: Lustre: 93472:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552259856/real 1552259856] req@ffff9bf115ec3900 x1626165090245968/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1552260457 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 10 16:27:37 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 10 16:27:37 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 10 16:28:36 sh-101-19.int kernel: LustreError: 46294:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 10 16:28:36 sh-101-19.int kernel: LustreError: 46294:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 10 16:37:38 sh-101-19.int kernel: Lustre: 93472:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552260457/real 1552260457] req@ffff9bf115ec3900 x1626165090245968/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1552261058 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 10 16:37:38 sh-101-19.int kernel: Lustre: 93472:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 10 16:37:38 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 10 16:37:38 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 10 16:37:38 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 10 16:37:38 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 10 16:47:40 sh-101-19.int kernel: Lustre: 93473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552261058/real 1552261058] req@ffff9bf115ec3600 x1626165090379376/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1552261659 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 10 16:47:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 10 16:47:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 10 16:50:11 sh-101-19.int kernel: Lustre: Evicted from MGS (at MGC10.0.10.51@o2ib7_0) after server handle changed from 0xb7044c61619b1001 to 0x253aba2c065219fb Mar 10 16:57:41 sh-101-19.int kernel: Lustre: 93473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552261660/real 1552261660] req@ffff9bf115ec3600 x1626165090379376/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1552262261 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 10 16:57:41 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 10 16:57:41 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 10 16:57:41 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 10 17:07:42 sh-101-19.int kernel: Lustre: 93473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552262261/real 1552262261] req@ffff9bf115ec3600 x1626165090379376/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1552262862 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Mar 10 17:07:42 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 10 17:07:42 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 10 17:17:43 sh-101-19.int kernel: Lustre: 93476:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552262862/real 1552262862] req@ffff9bf28a042d00 x1626165090591408/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 584/2088 e 1 to 1 dl 1552263463 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 10 17:17:43 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 10 17:17:43 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 10 17:17:43 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 10 17:17:43 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 10 17:17:43 sh-101-19.int kernel: Lustre: 93476:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 10 17:18:04 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0002-mdc-ffff9c05b8776000: operation ldlm_enqueue to node 10.0.10.51@o2ib7 failed: rc = -19 Mar 10 17:18:04 sh-101-19.int kernel: LustreError: Skipped 2 previous similar messages Mar 10 17:19:31 sh-101-19.int kernel: LustreError: 52711:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -19 Mar 10 17:19:31 sh-101-19.int kernel: LustreError: 52711:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -19 Mar 10 17:20:25 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9be62c4baa00 x1626164765805296/t68839036844(68839036844) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552264381 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 10 17:20:25 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 1 previous similar message Mar 10 17:45:03 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0001-mdc-ffff9c05b8776000: operation ldlm_enqueue to node 10.0.10.52@o2ib7 failed: rc = -19 Mar 10 17:45:03 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 10 17:45:03 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 10 17:45:03 sh-101-19.int kernel: LustreError: Skipped 7 previous similar messages Mar 10 17:45:42 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0000_UUID went back in time (transno 73045520170 was previously committed, server now claims 64435055525)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 10 17:46:48 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 74041117087 was previously committed, server now claims 68840102079)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 10 17:46:48 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9be62c4baa00 x1626164765805296/t68839036844(68839036844) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552265964 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 10 17:46:48 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 10 previous similar messages Mar 10 17:47:30 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 10 17:47:30 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 10 17:57:56 sh-101-19.int kernel: Lustre: 93473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552265119/real 1552265119] req@ffff9bf2588fd400 x1626165115764848/t0(0) o400->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552265875 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 10 17:57:56 sh-101-19.int kernel: Lustre: 93473:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 10 20:53:55 sh-101-19.int kernel: python invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 10 20:53:55 sh-101-19.int kernel: python cpuset=step_0 mems_allowed=0-1 Mar 10 20:53:55 sh-101-19.int kernel: CPU: 7 PID: 51166 Comm: python Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 10 20:53:55 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 10 20:53:55 sh-101-19.int kernel: Call Trace: Mar 10 20:53:55 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 10 20:53:55 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 10 20:53:55 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 10 20:53:55 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 10 20:53:55 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 10 20:53:55 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 10 20:53:55 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 10 20:53:55 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 10 20:53:55 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 10 20:53:55 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 10 20:53:55 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 10 20:53:55 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 10 20:53:55 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 10 20:53:55 sh-101-19.int kernel: Task in /slurm/uid_30356/job_38855416/step_0/task_0 killed as a result of limit of /slurm/uid_30356/job_38855416 Mar 10 20:53:55 sh-101-19.int kernel: memory: usage 15728640kB, limit 15728640kB, failcnt 13160 Mar 10 20:53:55 sh-101-19.int kernel: memory+swap: usage 15728640kB, limit 15728640kB, failcnt 0 Mar 10 20:53:55 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 10 20:53:55 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38855416: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 10 20:53:55 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38855416/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 10 20:53:55 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38855416/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 10 20:53:55 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38855416/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 10 20:53:55 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38855416/step_batch/task_0: cache:0KB rss:4496KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2424KB active_anon:2072KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 10 20:53:55 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38855416/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 10 20:53:56 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38855416/step_0/task_0: cache:96KB rss:15724048KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:790656KB active_anon:14933392KB inactive_file:52KB active_file:44KB unevictable:0KB Mar 10 20:53:56 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 10 20:53:56 sh-101-19.int kernel: [51094] 0 51094 26988 89 10 0 0 sleep Mar 10 20:53:56 sh-101-19.int kernel: [51127] 30356 51127 28334 440 14 0 0 slurm_script Mar 10 20:53:56 sh-101-19.int kernel: [51146] 30356 51146 80977 1489 38 0 0 srun Mar 10 20:53:56 sh-101-19.int kernel: [51148] 30356 51148 13100 218 30 0 0 srun Mar 10 20:53:56 sh-101-19.int kernel: [51166] 30356 51166 4994223 3934415 9701 0 0 python Mar 10 20:53:56 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 51166 (python) score 1003 or sacrifice child Mar 10 20:53:56 sh-101-19.int kernel: Killed process 51166 (python) total-vm:19976892kB, anon-rss:15723992kB, file-rss:13668kB, shmem-rss:0kB Mar 11 16:43:27 sh-101-19.int kernel: Lustre: 45305:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552347206/real 1552347206] req@ffff9be6ef4d5a00 x1626166864483824/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 4 to 1 dl 1552347807 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 11 16:43:27 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 16:43:27 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 11 16:43:27 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 11 16:43:27 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 11 16:48:44 sh-101-19.int kernel: Lustre: 192536:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552347523/real 1552347523] req@ffff9be5c5f38300 x1626166866995744/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/10616 e 1 to 1 dl 1552348124 ref 2 fl Rpc:XP/0/ffffffff rc 0/-1 Mar 11 16:48:44 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 16:48:44 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 11 16:51:44 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0001-mdc-ffff9c05b8776000: operation ldlm_enqueue to node 10.0.10.52@o2ib7 failed: rc = -19 Mar 11 16:51:44 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 16:51:44 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Mar 11 16:52:40 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9c028d72f200 x1626166864468320/t81719840231(81719840231) o101->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 1168/560 e 0 to 0 dl 1552349115 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 11 16:52:40 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 10 previous similar messages Mar 11 16:52:40 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 74041117087 was previously committed, server now claims 68840102079)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 11 16:52:41 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9bf426610900 x1626164771562800/t68845058494(68845058494) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552349117 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 11 16:52:41 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 3 previous similar messages Mar 11 16:53:25 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 11 17:15:49 sh-101-19.int kernel: Lustre: 93468:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552349147/real 1552349147] req@ffff9bee55c93c00 x1626166874877840/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1552349749 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 11 17:15:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 17:15:49 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 11 17:15:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 11 17:15:49 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 11 17:25:50 sh-101-19.int kernel: Lustre: 93468:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552349749/real 1552349749] req@ffff9bee55c93c00 x1626166874877840/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1552350350 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Mar 11 17:25:50 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 17:25:50 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 11 17:26:46 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552349650/real 1552349650] req@ffff9be5c6902400 x1626166874948208/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 0 to 1 dl 1552350406 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 11 17:26:46 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 17:26:46 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 11 17:38:26 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552350350/real 1552350350] req@ffff9bfbaa4b7500 x1626166875054256/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 0 to 1 dl 1552351106 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 11 17:38:26 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 17:38:26 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 11 17:39:22 sh-101-19.int kernel: Lustre: 151528:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552350406/real 1552350406] req@ffff9be5c6902400 x1626166874948208/t0(0) o36->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 536/1752 e 0 to 1 dl 1552351162 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 11 17:39:22 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 17:39:22 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 11 17:47:56 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0001-mdc-ffff9c05b8776000: operation ldlm_enqueue to node 10.0.10.52@o2ib7 failed: rc = -19 Mar 11 17:47:56 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 17:47:56 sh-101-19.int kernel: LustreError: Skipped 2 previous similar messages Mar 11 17:49:24 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 17:49:24 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 11 17:49:24 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0002_UUID went back in time (transno 90495812751 was previously committed, server now claims 60407812735)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 11 17:49:35 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9c028d72f200 x1626166864468320/t81719840231(81719840231) o101->fir-MDT0000-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 1168/560 e 0 to 0 dl 1552352531 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 11 17:49:35 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 7 previous similar messages Mar 11 17:50:25 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 11 17:50:25 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 74041117087 was previously committed, server now claims 68840102079)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 11 17:50:29 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9bf426610900 x1626164771562800/t68845058494(68845058494) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552352585 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 11 17:50:29 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 3 previous similar messages Mar 11 17:51:14 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 11 18:00:45 sh-101-19.int kernel: Lustre: 93474:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552351689/real 1552351689] req@ffff9be7b30a8600 x1626166875227184/t0(0) o400->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552352445 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 11 18:00:45 sh-101-19.int kernel: Lustre: 93473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552351689/real 1552351689] req@ffff9be7b30ae000 x1626166875227168/t0(0) o400->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552352445 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 11 18:01:10 sh-101-19.int kernel: Lustre: 93466:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552351714/real 1552351714] req@ffff9be628611e00 x1626166875230000/t0(0) o400->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552352470 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 11 18:01:35 sh-101-19.int kernel: Lustre: 93472:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552351739/real 1552351739] req@ffff9bf5d638da00 x1626166875232816/t0(0) o400->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552352495 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 11 18:01:35 sh-101-19.int kernel: Lustre: 93470:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552351739/real 1552351739] req@ffff9bf5d638d700 x1626166875232800/t0(0) o400->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552352495 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 11 18:01:35 sh-101-19.int kernel: Lustre: 93470:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 11 18:02:00 sh-101-19.int kernel: Lustre: 93476:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552351764/real 1552351764] req@ffff9be792e3b000 x1626166875235632/t0(0) o400->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552352520 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 11 18:02:36 sh-101-19.int kernel: Lustre: 93479:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552351800/real 1552351800] req@ffff9bfe17331b00 x1626166875238544/t0(0) o400->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552352556 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 11 18:03:58 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 11 18:03:58 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 11 18:04:22 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 11 21:01:01 sh-101-19.int kernel: python invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 11 21:01:01 sh-101-19.int kernel: python cpuset=step_0 mems_allowed=0-1 Mar 11 21:01:01 sh-101-19.int kernel: CPU: 11 PID: 194230 Comm: python Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 11 21:01:01 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 11 21:01:01 sh-101-19.int kernel: Call Trace: Mar 11 21:01:01 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 11 21:01:01 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 11 21:01:01 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 11 21:01:01 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 11 21:01:01 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 11 21:01:01 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 11 21:01:01 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 11 21:01:01 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 11 21:01:01 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 11 21:01:01 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 11 21:01:01 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 11 21:01:01 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 11 21:01:01 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 11 21:01:01 sh-101-19.int kernel: Task in /slurm/uid_30356/job_38968130/step_0/task_0 killed as a result of limit of /slurm/uid_30356/job_38968130 Mar 11 21:01:01 sh-101-19.int kernel: memory: usage 15728640kB, limit 15728640kB, failcnt 7033 Mar 11 21:01:01 sh-101-19.int kernel: memory+swap: usage 15728640kB, limit 15728640kB, failcnt 0 Mar 11 21:01:01 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 11 21:01:01 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38968130: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 11 21:01:01 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38968130/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 11 21:01:01 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38968130/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 11 21:01:01 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38968130/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 11 21:01:01 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38968130/step_batch/task_0: cache:0KB rss:4500KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2304KB active_anon:2196KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 11 21:01:01 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38968130/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 11 21:01:02 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_38968130/step_0/task_0: cache:0KB rss:15724140KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:161920KB active_anon:15562220KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 11 21:01:02 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 11 21:01:02 sh-101-19.int kernel: [194176] 0 194176 26988 89 11 0 0 sleep Mar 11 21:01:02 sh-101-19.int kernel: [194186] 30356 194186 28334 441 13 0 0 slurm_script Mar 11 21:01:02 sh-101-19.int kernel: [194205] 30356 194205 80977 1484 39 0 0 srun Mar 11 21:01:02 sh-101-19.int kernel: [194212] 30356 194212 13100 218 30 0 0 srun Mar 11 21:01:02 sh-101-19.int kernel: [194230] 30356 194230 4491720 3935779 8719 0 0 python Mar 11 21:01:02 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 194230 (python) score 1003 or sacrifice child Mar 11 21:01:02 sh-101-19.int kernel: Killed process 194230 (python) total-vm:17966880kB, anon-rss:15724052kB, file-rss:19064kB, shmem-rss:0kB Mar 12 08:54:26 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 12 08:54:26 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 12 08:54:58 sh-101-19.int kernel: Lustre: 93483:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552406091/real 1552406091] req@ffff9bfc5ba08c00 x1626167178224368/t0(0) o400->MGC10.0.10.51@o2ib7@10.0.10.51@o2ib7:26/25 lens 224/224 e 0 to 1 dl 1552406098 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 12 08:54:58 sh-101-19.int kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Mar 12 08:55:49 sh-101-19.int kernel: Lustre: Evicted from MGS (at MGC10.0.10.51@o2ib7_0) after server handle changed from 0x253aba2c065219fb to 0x4cd4038d7c2bcfef Mar 12 08:55:49 sh-101-19.int kernel: Lustre: MGC10.0.10.51@o2ib7: Connection restored to MGC10.0.10.51@o2ib7_0 (at 10.0.10.51@o2ib7) Mar 12 08:56:56 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 74041117087 was previously committed, server now claims 68840102079)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 12 08:56:56 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9bf2fc78d400 x1626166955172240/t85912044117(85912044117) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552406262 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 12 08:56:56 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 8 previous similar messages Mar 12 08:57:38 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 12 14:11:48 sh-101-19.int kernel: Lustre: 93478:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552425101/real 1552425101] req@ffff9bf828c15400 x1626167185241840/t0(0) o400->fir-MDT0002-mdc-ffff9c05b8776000@10.0.10.51@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552425108 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 12 14:11:48 sh-101-19.int kernel: Lustre: 93485:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552425101/real 1552425101] req@ffff9bf828c13300 x1626167185241792/t0(0) o400->MGC10.0.10.51@o2ib7@10.0.10.51@o2ib7:26/25 lens 224/224 e 0 to 1 dl 1552425108 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 12 14:11:48 sh-101-19.int kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Mar 12 14:11:48 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 12 14:11:48 sh-101-19.int kernel: Lustre: 93478:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 12 14:11:49 sh-101-19.int kernel: Lustre: 93479:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552425101/real 1552425101] req@ffff9bf828c16600 x1626167185241856/t0(0) o400->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552425109 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 12 14:11:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 12 14:11:49 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 12 14:11:49 sh-101-19.int kernel: Lustre: 93479:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 12 14:12:39 sh-101-19.int kernel: Lustre: Evicted from MGS (at MGC10.0.10.51@o2ib7_0) after server handle changed from 0x4cd4038d7c2bcfef to 0x974d7e52602357 Mar 12 14:12:39 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 74041117087 was previously committed, server now claims 68840102079)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 12 14:12:39 sh-101-19.int kernel: Lustre: MGC10.0.10.51@o2ib7: Connection restored to MGC10.0.10.51@o2ib7_0 (at 10.0.10.51@o2ib7) Mar 12 14:12:39 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 12 14:12:39 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9bf2fc78d400 x1626166955172240/t85912044117(85912044117) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552425190 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 12 14:12:39 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 5 previous similar messages Mar 12 14:12:55 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 12 14:12:58 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 12 14:12:58 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 12 18:50:06 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552441775/real 1552441775] req@ffff9bf47553bf00 x1626167194035008/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552441806 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 12 18:50:06 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 12 18:50:06 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 12 18:50:06 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 12 18:50:06 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 12 18:50:37 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552441806/real 1552441806] req@ffff9bf47553bf00 x1626167194035008/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552441837 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 12 18:50:37 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 12 18:50:37 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 12 18:51:08 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552441837/real 1552441837] req@ffff9bf47553bf00 x1626167194035008/t0(0) o36->fir-MDT0003-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 528/1752 e 0 to 1 dl 1552441868 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 12 18:51:08 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 12 18:51:08 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 12 20:04:57 sh-101-19.int kernel: EXT4-fs (loop1): mounting ext3 file system using the ext4 subsystem Mar 12 20:04:57 sh-101-19.int kernel: EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: errors=remount-ro Mar 12 20:25:46 sh-101-19.int kernel: stata-mp invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 12 20:25:46 sh-101-19.int kernel: stata-mp cpuset=step_batch mems_allowed=0-1 Mar 12 20:25:46 sh-101-19.int kernel: CPU: 16 PID: 139713 Comm: stata-mp Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 12 20:25:46 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 12 20:25:46 sh-101-19.int kernel: Call Trace: Mar 12 20:25:46 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 12 20:25:46 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 12 20:25:46 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 12 20:25:46 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 12 20:25:46 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 12 20:25:46 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 12 20:25:46 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 12 20:25:46 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 12 20:25:46 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 12 20:25:46 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 12 20:25:46 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 12 20:25:46 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 12 20:25:46 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 12 20:25:46 sh-101-19.int kernel: Task in /slurm/uid_339679/job_38942743/step_batch/task_0 killed as a result of limit of /slurm/uid_339679/job_38942743/step_batch Mar 12 20:25:46 sh-101-19.int kernel: memory: usage 83886080kB, limit 83886080kB, failcnt 3426806 Mar 12 20:25:46 sh-101-19.int kernel: memory+swap: usage 83886080kB, limit 83886080kB, failcnt 1 Mar 12 20:25:46 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 12 20:25:46 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_339679/job_38942743/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 12 20:25:46 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_339679/job_38942743/step_batch/task_0: cache:364KB rss:83885716KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:4593036KB active_anon:79292692KB inactive_file:172KB active_file:168KB unevictable:0KB Mar 12 20:25:46 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 12 20:25:46 sh-101-19.int kernel: [139629] 339679 139629 28329 410 14 0 0 slurm_script Mar 12 20:25:46 sh-101-19.int kernel: [139664] 339679 139664 28329 414 13 0 0 stata-mp Mar 12 20:25:46 sh-101-19.int kernel: [139684] 339679 139684 94758 2171 33 0 0 starter-suid Mar 12 20:25:46 sh-101-19.int kernel: [139713] 339679 139713 22424348 20970245 41215 0 0 stata-mp Mar 12 20:25:46 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 139713 (stata-mp) score 1001 or sacrifice child Mar 12 20:25:46 sh-101-19.int kernel: Killed process 139713 (stata-mp) total-vm:89697392kB, anon-rss:83880980kB, file-rss:0kB, shmem-rss:0kB Mar 12 21:39:50 sh-101-19.int kernel: EXT4-fs (loop1): mounting ext3 file system using the ext4 subsystem Mar 12 21:39:50 sh-101-19.int kernel: EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: errors=remount-ro Mar 13 04:57:28 sh-101-19.int kernel: EXT4-fs (loop1): mounting ext3 file system using the ext4 subsystem Mar 13 04:57:28 sh-101-19.int kernel: EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: errors=remount-ro Mar 13 11:07:52 sh-101-19.int kernel: ixgbe 0000:81:00.0 em1: changing MTU from 1500 to 9000 Mar 13 11:07:53 sh-101-19.int kernel: ixgbe 0000:81:00.0 em1: detected SFP+: 9 Mar 13 11:07:55 sh-101-19.int kernel: ixgbe 0000:81:00.0 em1: NIC Link is Up 1 Gbps, Flow Control: RX/TX Mar 13 12:10:03 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 12:10:03 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 12:10:03 sh-101-19.int kernel: CPU: 14 PID: 95763 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 12:10:03 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 12:10:03 sh-101-19.int kernel: Call Trace: Mar 13 12:10:03 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 12:10:03 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 12:10:03 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 12:10:03 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 12:10:03 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 12:10:03 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 12:10:03 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 12:10:03 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 12:10:03 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 12:10:03 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 12:10:03 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 12:10:03 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 12:10:03 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39045654/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39045654 Mar 13 12:10:03 sh-101-19.int kernel: memory: usage 40960000kB, limit 40960000kB, failcnt 8917 Mar 13 12:10:03 sh-101-19.int kernel: memory+swap: usage 40960000kB, limit 40960000kB, failcnt 0 Mar 13 12:10:03 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 12:10:03 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045654: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:10:03 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045654/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:10:03 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045654/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:10:03 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045654/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:10:03 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045654/step_batch/task_0: cache:0KB rss:4312KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2188KB active_anon:2124KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:10:03 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045654/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:10:03 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045654/step_0/task_0: cache:5196KB rss:40950492KB rss_huge:0KB mapped_file:4260KB swap:0KB inactive_anon:742304KB active_anon:40211552KB inactive_file:600KB active_file:336KB unevictable:0KB Mar 13 12:10:03 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 12:10:03 sh-101-19.int kernel: [95676] 0 95676 26988 89 11 0 0 sleep Mar 13 12:10:03 sh-101-19.int kernel: [95710] 286587 95710 28329 409 14 0 0 slurm_script Mar 13 12:10:03 sh-101-19.int kernel: [95716] 286587 95716 80960 1482 39 0 0 srun Mar 13 12:10:04 sh-101-19.int kernel: [95718] 286587 95718 13093 217 28 0 0 srun Mar 13 12:10:04 sh-101-19.int kernel: [95739] 286587 95739 202995 71223 239 0 0 julia Mar 13 12:10:04 sh-101-19.int kernel: [95761] 286587 95761 3524782 3433963 6803 0 0 julia Mar 13 12:10:04 sh-101-19.int kernel: [95762] 286587 95762 3471929 3356467 6655 0 0 julia Mar 13 12:10:04 sh-101-19.int kernel: [95763] 286587 95763 3541248 3425846 6788 0 0 julia Mar 13 12:10:04 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 95764 (julia) score 336 or sacrifice child Mar 13 12:10:04 sh-101-19.int kernel: Killed process 95761 (julia) total-vm:14099128kB, anon-rss:13685844kB, file-rss:47960kB, shmem-rss:2048kB Mar 13 12:33:26 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 12:33:26 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 12:33:26 sh-101-19.int kernel: CPU: 16 PID: 99102 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 12:33:26 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 12:33:26 sh-101-19.int kernel: Call Trace: Mar 13 12:33:26 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 12:33:26 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 12:33:26 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 12:33:26 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 12:33:26 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 12:33:26 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 12:33:26 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 12:33:26 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 12:33:26 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 12:33:26 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 12:33:26 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 12:33:26 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 12:33:26 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 12:33:26 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39045684/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39045684 Mar 13 12:33:26 sh-101-19.int kernel: memory: usage 40960000kB, limit 40960000kB, failcnt 17266 Mar 13 12:33:26 sh-101-19.int kernel: memory+swap: usage 40960000kB, limit 40960000kB, failcnt 0 Mar 13 12:33:26 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_batch/task_0: cache:0KB rss:4304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2292KB active_anon:2012KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_0/task_0: cache:4300KB rss:40951396KB rss_huge:0KB mapped_file:4260KB swap:0KB inactive_anon:2892752KB active_anon:38062900KB inactive_file:28KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 12:33:26 sh-101-19.int kernel: [99062] 0 99062 26988 88 9 0 0 sleep Mar 13 12:33:26 sh-101-19.int kernel: [99072] 286587 99072 28329 409 13 0 0 slurm_script Mar 13 12:33:26 sh-101-19.int kernel: [99078] 286587 99078 80960 1473 38 0 0 srun Mar 13 12:33:26 sh-101-19.int kernel: [99080] 286587 99080 13093 215 29 0 0 srun Mar 13 12:33:26 sh-101-19.int kernel: [99099] 286587 99099 219506 71784 240 0 0 julia Mar 13 12:33:26 sh-101-19.int kernel: [99102] 286587 99102 3518862 3428400 6791 0 0 julia Mar 13 12:33:26 sh-101-19.int kernel: [99103] 286587 99103 3493200 3378142 6698 0 0 julia Mar 13 12:33:26 sh-101-19.int kernel: [99104] 286587 99104 3558967 3411237 6767 0 0 julia Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 99105 (julia) score 335 or sacrifice child Mar 13 12:33:26 sh-101-19.int kernel: Killed process 99102 (julia) total-vm:14075448kB, anon-rss:13661924kB, file-rss:49628kB, shmem-rss:2048kB Mar 13 12:33:26 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 12:33:26 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 12:33:26 sh-101-19.int kernel: CPU: 12 PID: 99104 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 12:33:26 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 12:33:26 sh-101-19.int kernel: Call Trace: Mar 13 12:33:26 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 12:33:26 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 12:33:26 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 12:33:26 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 12:33:26 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 12:33:26 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 12:33:26 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 12:33:26 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 12:33:26 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 12:33:26 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 12:33:26 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 12:33:26 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 12:33:26 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 12:33:26 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39045684/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39045684 Mar 13 12:33:26 sh-101-19.int kernel: memory: usage 40960000kB, limit 40960000kB, failcnt 19349 Mar 13 12:33:26 sh-101-19.int kernel: memory+swap: usage 40960000kB, limit 40960000kB, failcnt 0 Mar 13 12:33:26 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_batch/task_0: cache:0KB rss:4304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2292KB active_anon:2012KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_0/task_0: cache:4300KB rss:38040672KB rss_huge:0KB mapped_file:4260KB swap:0KB inactive_anon:2351640KB active_anon:35618160KB inactive_file:40KB active_file:0KB unevictable:0KB Mar 13 12:33:26 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 12:33:26 sh-101-19.int kernel: [99062] 0 99062 26988 88 9 0 0 sleep Mar 13 12:33:26 sh-101-19.int kernel: [99072] 286587 99072 28329 409 13 0 0 slurm_script Mar 13 12:33:26 sh-101-19.int kernel: [99078] 286587 99078 80960 1473 38 0 0 srun Mar 13 12:33:26 sh-101-19.int kernel: [99080] 286587 99080 13093 215 29 0 0 srun Mar 13 12:33:26 sh-101-19.int kernel: [99099] 286587 99099 219506 71784 240 0 0 julia Mar 13 12:33:26 sh-101-19.int kernel: [99103] 286587 99103 3493200 3378142 6698 0 0 julia Mar 13 12:33:26 sh-101-19.int kernel: [99104] 286587 99104 3558967 3411237 6767 0 0 julia Mar 13 12:33:26 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 99119 (julia) score 333 or sacrifice child Mar 13 12:33:26 sh-101-19.int kernel: Killed process 99104 (julia) total-vm:14235868kB, anon-rss:13593244kB, file-rss:49648kB, shmem-rss:2056kB Mar 13 12:33:27 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 12:33:27 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 12:33:27 sh-101-19.int kernel: CPU: 14 PID: 99103 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 12:33:27 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 12:33:27 sh-101-19.int kernel: Call Trace: Mar 13 12:33:27 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 12:33:27 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 12:33:27 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 12:33:27 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 12:33:27 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 12:33:27 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 12:33:27 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 12:33:27 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 12:33:27 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 12:33:27 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 12:33:27 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 12:33:27 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 12:33:27 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 12:33:27 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39045684/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39045684 Mar 13 12:33:27 sh-101-19.int kernel: memory: usage 40960000kB, limit 40960000kB, failcnt 23032 Mar 13 12:33:27 sh-101-19.int kernel: memory+swap: usage 40960000kB, limit 40960000kB, failcnt 0 Mar 13 12:33:27 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 12:33:27 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:27 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:27 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:27 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:27 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_batch/task_0: cache:0KB rss:4304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2292KB active_anon:2012KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:27 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:27 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045684/step_0/task_0: cache:4268KB rss:18725364KB rss_huge:0KB mapped_file:4264KB swap:0KB inactive_anon:2273304KB active_anon:16414316KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:33:27 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 12:33:27 sh-101-19.int kernel: [99062] 0 99062 26988 88 9 0 0 sleep Mar 13 12:33:28 sh-101-19.int kernel: [99072] 286587 99072 28329 409 13 0 0 slurm_script Mar 13 12:33:28 sh-101-19.int kernel: [99078] 286587 99078 80960 1473 38 0 0 srun Mar 13 12:33:28 sh-101-19.int kernel: [99080] 286587 99080 13093 215 29 0 0 srun Mar 13 12:33:28 sh-101-19.int kernel: [99099] 286587 99099 219506 71784 240 0 0 julia Mar 13 12:33:28 sh-101-19.int kernel: [99103] 286587 99103 3493201 3378177 6698 0 0 julia Mar 13 12:33:28 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 99120 (julia) score 330 or sacrifice child Mar 13 12:33:28 sh-101-19.int kernel: Killed process 99103 (julia) total-vm:13972804kB, anon-rss:13461000kB, file-rss:49648kB, shmem-rss:2060kB Mar 13 12:55:09 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 12:55:09 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 12:55:09 sh-101-19.int kernel: CPU: 14 PID: 101873 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 12:55:09 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 12:55:09 sh-101-19.int kernel: Call Trace: Mar 13 12:55:09 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 12:55:09 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 12:55:09 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 12:55:09 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 12:55:09 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 12:55:09 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 12:55:09 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 12:55:09 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 12:55:09 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 12:55:09 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 12:55:09 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 12:55:09 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 12:55:09 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 12:55:09 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39045745/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39045745 Mar 13 12:55:09 sh-101-19.int kernel: memory: usage 40960000kB, limit 40960000kB, failcnt 11003 Mar 13 12:55:09 sh-101-19.int kernel: memory+swap: usage 40960000kB, limit 40960000kB, failcnt 1 Mar 13 12:55:09 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 12:55:09 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:09 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:09 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_batch/task_0: cache:0KB rss:4312KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2268KB active_anon:2044KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_0/task_0: cache:4360KB rss:40951328KB rss_huge:0KB mapped_file:4260KB swap:0KB inactive_anon:2882224KB active_anon:38073368KB inactive_file:68KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 12:55:10 sh-101-19.int kernel: [101830] 0 101830 26988 88 11 0 0 sleep Mar 13 12:55:10 sh-101-19.int kernel: [101840] 286587 101840 28329 410 13 0 0 slurm_script Mar 13 12:55:10 sh-101-19.int kernel: [101846] 286587 101846 80960 1481 38 0 0 srun Mar 13 12:55:10 sh-101-19.int kernel: [101848] 286587 101848 13093 216 29 0 0 srun Mar 13 12:55:10 sh-101-19.int kernel: [101866] 286587 101866 219558 71835 241 0 0 julia Mar 13 12:55:10 sh-101-19.int kernel: [101873] 286587 101873 3473683 3383192 6701 0 0 julia Mar 13 12:55:10 sh-101-19.int kernel: [101874] 286587 101874 3529931 3414922 6768 0 0 julia Mar 13 12:55:10 sh-101-19.int kernel: [101875] 286587 101875 3534757 3419618 6782 0 0 julia Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 101886 (julia) score 334 or sacrifice child Mar 13 12:55:10 sh-101-19.int kernel: Killed process 101875 (julia) total-vm:14139028kB, anon-rss:13626772kB, file-rss:49644kB, shmem-rss:2056kB Mar 13 12:55:10 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 12:55:10 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 12:55:10 sh-101-19.int kernel: CPU: 16 PID: 101873 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 12:55:10 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 12:55:10 sh-101-19.int kernel: Call Trace: Mar 13 12:55:10 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 12:55:10 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 12:55:10 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 12:55:10 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 12:55:10 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 12:55:10 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 12:55:10 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 12:55:10 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 12:55:10 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 12:55:10 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 12:55:10 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 12:55:10 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 12:55:10 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39045745/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39045745 Mar 13 12:55:10 sh-101-19.int kernel: memory: usage 40960000kB, limit 40960000kB, failcnt 13672 Mar 13 12:55:10 sh-101-19.int kernel: memory+swap: usage 40960000kB, limit 40960000kB, failcnt 1 Mar 13 12:55:10 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_batch/task_0: cache:0KB rss:4312KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2268KB active_anon:2044KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39045745/step_0/task_0: cache:4356KB rss:38080744KB rss_huge:0KB mapped_file:4260KB swap:0KB inactive_anon:2379992KB active_anon:35663612KB inactive_file:96KB active_file:0KB unevictable:0KB Mar 13 12:55:10 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 12:55:10 sh-101-19.int kernel: [101830] 0 101830 26988 88 11 0 0 sleep Mar 13 12:55:10 sh-101-19.int kernel: [101840] 286587 101840 28329 410 13 0 0 slurm_script Mar 13 12:55:10 sh-101-19.int kernel: [101846] 286587 101846 80960 1481 38 0 0 srun Mar 13 12:55:10 sh-101-19.int kernel: [101848] 286587 101848 13093 216 29 0 0 srun Mar 13 12:55:10 sh-101-19.int kernel: [101866] 286587 101866 219558 71835 241 0 0 julia Mar 13 12:55:10 sh-101-19.int kernel: [101873] 286587 101873 3473683 3383192 6701 0 0 julia Mar 13 12:55:10 sh-101-19.int kernel: [101874] 286587 101874 3529931 3414922 6768 0 0 julia Mar 13 12:55:10 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 101890 (julia) score 334 or sacrifice child Mar 13 12:55:10 sh-101-19.int kernel: Killed process 101874 (julia) total-vm:14119724kB, anon-rss:13607968kB, file-rss:49648kB, shmem-rss:2072kB Mar 13 13:16:53 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 13:16:53 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 13:16:53 sh-101-19.int kernel: CPU: 10 PID: 104626 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 13:16:53 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 13:16:53 sh-101-19.int kernel: Call Trace: Mar 13 13:16:53 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 13:16:53 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 13:16:53 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 13:16:53 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 13:16:53 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 13:16:53 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 13:16:53 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 13:16:53 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 13:16:53 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 13:16:53 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 13:16:53 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 13:16:53 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 13:16:53 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 13:16:53 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39051023/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39051023 Mar 13 13:16:53 sh-101-19.int kernel: memory: usage 40960000kB, limit 40960000kB, failcnt 9191 Mar 13 13:16:53 sh-101-19.int kernel: memory+swap: usage 40960000kB, limit 40960000kB, failcnt 0 Mar 13 13:16:53 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 13:16:53 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39051023: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 13:16:53 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39051023/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 13:16:53 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39051023/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 13:16:53 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39051023/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 13:16:53 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39051023/step_batch/task_0: cache:0KB rss:4312KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2228KB active_anon:2084KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 13:16:53 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39051023/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 13:16:54 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39051023/step_0/task_0: cache:4376KB rss:40951304KB rss_huge:0KB mapped_file:4260KB swap:0KB inactive_anon:2858232KB active_anon:38097336KB inactive_file:120KB active_file:0KB unevictable:0KB Mar 13 13:16:54 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 13:16:54 sh-101-19.int kernel: [104572] 0 104572 26988 89 10 0 0 sleep Mar 13 13:16:54 sh-101-19.int kernel: [104588] 286587 104588 28329 410 14 0 0 slurm_script Mar 13 13:16:54 sh-101-19.int kernel: [104595] 286587 104595 80960 1480 39 0 0 srun Mar 13 13:16:54 sh-101-19.int kernel: [104600] 286587 104600 13093 216 29 0 0 srun Mar 13 13:16:54 sh-101-19.int kernel: [104618] 286587 104618 219325 71603 244 0 0 julia Mar 13 13:16:54 sh-101-19.int kernel: [104624] 286587 104624 3474927 3384799 6706 0 0 julia Mar 13 13:16:54 sh-101-19.int kernel: [104625] 286587 104625 3538960 3424260 6786 0 0 julia Mar 13 13:16:54 sh-101-19.int kernel: [104626] 286587 104626 3557263 3409860 6759 0 0 julia Mar 13 13:16:54 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 104641 (julia) score 335 or sacrifice child Mar 13 13:16:54 sh-101-19.int kernel: Killed process 104625 (julia) total-vm:14155840kB, anon-rss:13644036kB, file-rss:50924kB, shmem-rss:2080kB Mar 13 14:55:04 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 14:55:04 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 14:55:04 sh-101-19.int kernel: CPU: 7 PID: 115665 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 14:55:04 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 14:55:04 sh-101-19.int kernel: Call Trace: Mar 13 14:55:04 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 14:55:04 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 14:55:04 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 14:55:04 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 14:55:04 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 14:55:04 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 14:55:04 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 14:55:04 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 14:55:04 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 14:55:04 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 14:55:04 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 14:55:04 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 14:55:04 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 14:55:04 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39053756/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39053756 Mar 13 14:55:04 sh-101-19.int kernel: memory: usage 61440000kB, limit 61440000kB, failcnt 9155 Mar 13 14:55:04 sh-101-19.int kernel: memory+swap: usage 61440000kB, limit 61440000kB, failcnt 0 Mar 13 14:55:04 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 14:55:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 14:55:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 14:55:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 14:55:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 14:55:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_batch/task_0: cache:0KB rss:4312KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2296KB active_anon:2016KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 14:55:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 14:55:04 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_0/task_0: cache:4376KB rss:61431312KB rss_huge:0KB mapped_file:4260KB swap:0KB inactive_anon:3930016KB active_anon:57505548KB inactive_file:116KB active_file:0KB unevictable:0KB Mar 13 14:55:04 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 14:55:04 sh-101-19.int kernel: [115581] 0 115581 26988 88 10 0 0 sleep Mar 13 14:55:04 sh-101-19.int kernel: [115611] 286587 115611 28329 410 12 0 0 slurm_script Mar 13 14:55:04 sh-101-19.int kernel: [115619] 286587 115619 80960 1474 38 0 0 srun Mar 13 14:55:04 sh-101-19.int kernel: [115620] 286587 115620 13093 216 29 0 0 srun Mar 13 14:55:04 sh-101-19.int kernel: [115641] 286587 115641 219507 71807 242 0 0 julia Mar 13 14:55:04 sh-101-19.int kernel: [115664] 286587 115664 5210354 5120140 10095 0 0 julia Mar 13 14:55:04 sh-101-19.int kernel: [115665] 286587 115665 5358484 5243725 10341 0 0 julia Mar 13 14:55:04 sh-101-19.int kernel: [115666] 286587 115666 5089475 4974750 9814 0 0 julia Mar 13 14:55:04 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 115681 (julia) score 342 or sacrifice child Mar 13 14:55:04 sh-101-19.int kernel: Killed process 115665 (julia) total-vm:21433936kB, anon-rss:20921896kB, file-rss:50928kB, shmem-rss:2076kB Mar 13 15:22:54 sh-101-19.int kernel: R invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 15:22:54 sh-101-19.int kernel: R cpuset=step_30 mems_allowed=0-1 Mar 13 15:22:54 sh-101-19.int kernel: CPU: 9 PID: 124732 Comm: R Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 15:22:54 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 15:22:54 sh-101-19.int kernel: Call Trace: Mar 13 15:22:54 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 15:22:54 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 15:22:54 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 15:22:54 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 15:22:54 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 15:22:54 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 15:22:54 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 15:22:55 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 15:22:55 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 15:22:55 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 15:22:55 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 15:22:55 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 15:22:55 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 15:22:55 sh-101-19.int kernel: Task in /slurm/uid_348003/job_39058397/step_30/task_0 killed as a result of limit of /slurm/uid_348003/job_39058397/step_30 Mar 13 15:22:55 sh-101-19.int kernel: memory: usage 4194304kB, limit 4194304kB, failcnt 68226 Mar 13 15:22:55 sh-101-19.int kernel: memory+swap: usage 4194304kB, limit 4194304kB, failcnt 0 Mar 13 15:22:55 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 15:22:55 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_30: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:22:55 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_30/task_0: cache:0KB rss:4194304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:996992KB active_anon:3197276KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:22:55 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 15:22:55 sh-101-19.int kernel: [124732] 348003 124732 1156961 1050673 2129 0 0 R Mar 13 15:22:55 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 124732 (R) score 1004 or sacrifice child Mar 13 15:22:55 sh-101-19.int kernel: Killed process 124732 (R) total-vm:4627844kB, anon-rss:4194144kB, file-rss:8548kB, shmem-rss:0kB Mar 13 15:23:03 sh-101-19.int kernel: R invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 15:23:03 sh-101-19.int kernel: R cpuset=step_31 mems_allowed=0-1 Mar 13 15:23:03 sh-101-19.int kernel: CPU: 13 PID: 124733 Comm: R Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 15:23:03 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 15:23:03 sh-101-19.int kernel: Call Trace: Mar 13 15:23:03 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 15:23:03 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 15:23:03 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 15:23:03 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 15:23:03 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 15:23:03 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 15:23:03 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 15:23:03 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 15:23:03 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 15:23:03 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 15:23:03 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 15:23:03 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 15:23:03 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 15:23:03 sh-101-19.int kernel: Task in /slurm/uid_348003/job_39058397/step_31/task_0 killed as a result of limit of /slurm/uid_348003/job_39058397/step_31 Mar 13 15:23:03 sh-101-19.int kernel: memory: usage 4194304kB, limit 4194304kB, failcnt 73780 Mar 13 15:23:03 sh-101-19.int kernel: memory+swap: usage 4194304kB, limit 4194304kB, failcnt 0 Mar 13 15:23:03 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 15:23:03 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_31: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:23:03 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_31/task_0: cache:0KB rss:4194304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:912852KB active_anon:3281452KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:23:03 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 15:23:03 sh-101-19.int kernel: [124733] 348003 124733 1151681 1050560 2131 0 0 R Mar 13 15:23:03 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 124733 (R) score 1003 or sacrifice child Mar 13 15:23:03 sh-101-19.int kernel: Killed process 124733 (R) total-vm:4606724kB, anon-rss:4194116kB, file-rss:8124kB, shmem-rss:0kB Mar 13 15:24:46 sh-101-19.int kernel: R invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 15:24:46 sh-101-19.int kernel: R cpuset=step_69 mems_allowed=0-1 Mar 13 15:24:46 sh-101-19.int kernel: CPU: 9 PID: 125008 Comm: R Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 15:24:46 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 15:24:46 sh-101-19.int kernel: Call Trace: Mar 13 15:24:46 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 15:24:46 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 15:24:46 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 15:24:46 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 15:24:46 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 15:24:46 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 15:24:46 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 15:24:46 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 15:24:46 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 15:24:46 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 15:24:46 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 15:24:46 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 15:24:46 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 15:24:46 sh-101-19.int kernel: Task in /slurm/uid_348003/job_39058397/step_69/task_0 killed as a result of limit of /slurm/uid_348003/job_39058397/step_69 Mar 13 15:24:46 sh-101-19.int kernel: memory: usage 4194304kB, limit 4194304kB, failcnt 35619 Mar 13 15:24:46 sh-101-19.int kernel: memory+swap: usage 4194304kB, limit 4194304kB, failcnt 0 Mar 13 15:24:46 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 15:24:47 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_69: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:24:47 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_69/task_0: cache:4KB rss:4194300KB rss_huge:0KB mapped_file:4KB swap:0KB inactive_anon:904832KB active_anon:3289468KB inactive_file:4KB active_file:0KB unevictable:0KB Mar 13 15:24:47 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 15:24:47 sh-101-19.int kernel: [125008] 348003 125008 1187793 1050682 2129 0 0 R Mar 13 15:24:47 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 125008 (R) score 1004 or sacrifice child Mar 13 15:24:47 sh-101-19.int kernel: Killed process 125008 (R) total-vm:4751172kB, anon-rss:4194244kB, file-rss:8484kB, shmem-rss:0kB Mar 13 15:25:16 sh-101-19.int kernel: R invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 15:25:16 sh-101-19.int kernel: R cpuset=step_84 mems_allowed=0-1 Mar 13 15:25:16 sh-101-19.int kernel: CPU: 13 PID: 125040 Comm: R Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 15:25:16 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 15:25:16 sh-101-19.int kernel: Call Trace: Mar 13 15:25:16 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 15:25:16 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 15:25:16 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 15:25:16 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 15:25:16 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 15:25:16 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 15:25:17 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 15:25:17 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 15:25:17 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 15:25:17 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 15:25:17 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 15:25:17 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 15:25:17 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 15:25:17 sh-101-19.int kernel: Task in /slurm/uid_348003/job_39058397/step_84/task_0 killed as a result of limit of /slurm/uid_348003/job_39058397/step_84 Mar 13 15:25:17 sh-101-19.int kernel: memory: usage 4194304kB, limit 4194304kB, failcnt 59401 Mar 13 15:25:17 sh-101-19.int kernel: memory+swap: usage 4194304kB, limit 4194304kB, failcnt 0 Mar 13 15:25:17 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 15:25:17 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_84: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:25:17 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_84/task_0: cache:0KB rss:4194304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:699776KB active_anon:3494528KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:25:17 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 15:25:17 sh-101-19.int kernel: [125040] 348003 125040 1180215 1050545 2131 0 0 R Mar 13 15:25:17 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 125040 (R) score 1003 or sacrifice child Mar 13 15:25:17 sh-101-19.int kernel: Killed process 125040 (R) total-vm:4720860kB, anon-rss:4194052kB, file-rss:8128kB, shmem-rss:0kB Mar 13 15:26:33 sh-101-19.int kernel: R invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 15:26:33 sh-101-19.int kernel: R cpuset=step_113 mems_allowed=0-1 Mar 13 15:26:33 sh-101-19.int kernel: CPU: 9 PID: 125264 Comm: R Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 15:26:33 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 15:26:33 sh-101-19.int kernel: Call Trace: Mar 13 15:26:33 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 15:26:33 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 15:26:33 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 15:26:33 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 15:26:33 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 15:26:33 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 15:26:33 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 15:26:33 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 15:26:33 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 15:26:33 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 15:26:33 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 15:26:33 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 15:26:33 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 15:26:33 sh-101-19.int kernel: Task in /slurm/uid_348003/job_39058397/step_113/task_0 killed as a result of limit of /slurm/uid_348003/job_39058397/step_113 Mar 13 15:26:33 sh-101-19.int kernel: memory: usage 4194304kB, limit 4194304kB, failcnt 34162 Mar 13 15:26:33 sh-101-19.int kernel: memory+swap: usage 4194304kB, limit 4194304kB, failcnt 0 Mar 13 15:26:33 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 15:26:33 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_113: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:26:33 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_348003/job_39058397/step_113/task_0: cache:0KB rss:4194304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:699136KB active_anon:3495164KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:26:33 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 15:26:33 sh-101-19.int kernel: [125264] 348003 125264 1133546 1050561 2128 0 0 R Mar 13 15:26:33 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 125264 (R) score 1003 or sacrifice child Mar 13 15:26:33 sh-101-19.int kernel: Killed process 125264 (R) total-vm:4534184kB, anon-rss:4194120kB, file-rss:8124kB, shmem-rss:0kB Mar 13 15:32:54 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 15:32:54 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 15:32:54 sh-101-19.int kernel: CPU: 10 PID: 115664 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 15:32:54 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 15:32:54 sh-101-19.int kernel: Call Trace: Mar 13 15:32:54 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 15:32:54 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 15:32:54 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 15:32:54 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 15:32:54 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 15:32:54 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 15:32:54 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 15:32:54 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 15:32:54 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 15:32:54 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 15:32:54 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 15:32:54 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 15:32:54 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 15:32:54 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39053756/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39053756 Mar 13 15:32:54 sh-101-19.int kernel: memory: usage 61440000kB, limit 61440000kB, failcnt 16645 Mar 13 15:32:54 sh-101-19.int kernel: memory+swap: usage 61440000kB, limit 61440000kB, failcnt 0 Mar 13 15:32:54 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 15:32:54 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:32:54 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:32:54 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:32:54 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:32:54 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_batch/task_0: cache:0KB rss:4312KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2296KB active_anon:2016KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:32:54 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 15:32:54 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39053756/step_0/task_0: cache:3244KB rss:61432444KB rss_huge:0KB mapped_file:3180KB swap:0KB inactive_anon:3981888KB active_anon:57453736KB inactive_file:64KB active_file:0KB unevictable:0KB Mar 13 15:32:54 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 15:32:54 sh-101-19.int kernel: [115581] 0 115581 26988 88 10 0 0 sleep Mar 13 15:32:54 sh-101-19.int kernel: [115611] 286587 115611 28329 410 12 0 0 slurm_script Mar 13 15:32:54 sh-101-19.int kernel: [115619] 286587 115619 80960 1474 38 0 0 srun Mar 13 15:32:54 sh-101-19.int kernel: [115620] 286587 115620 13093 216 29 0 0 srun Mar 13 15:32:54 sh-101-19.int kernel: [115641] 286587 115641 219507 71836 242 0 0 julia Mar 13 15:32:54 sh-101-19.int kernel: [115664] 286587 115664 7851227 7761195 15253 0 0 julia Mar 13 15:32:54 sh-101-19.int kernel: [115666] 286587 115666 7679168 7564558 14872 0 0 julia Mar 13 15:32:54 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 115667 (julia) score 506 or sacrifice child Mar 13 15:32:54 sh-101-19.int kernel: Killed process 115664 (julia) total-vm:31404908kB, anon-rss:30991784kB, file-rss:50956kB, shmem-rss:2040kB Mar 13 16:25:05 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 16:25:05 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 16:25:05 sh-101-19.int kernel: CPU: 12 PID: 127586 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 16:25:05 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 16:25:05 sh-101-19.int kernel: Call Trace: Mar 13 16:25:05 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 16:25:05 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 16:25:05 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 16:25:05 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 16:25:05 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 16:25:05 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 16:25:05 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 16:25:05 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 16:25:05 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 16:25:05 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 16:25:05 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 16:25:05 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 16:25:05 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39054188/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39054188 Mar 13 16:25:05 sh-101-19.int kernel: memory: usage 61440000kB, limit 61440000kB, failcnt 10963 Mar 13 16:25:05 sh-101-19.int kernel: memory+swap: usage 61440000kB, limit 61440000kB, failcnt 1 Mar 13 16:25:05 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_batch/task_0: cache:0KB rss:4304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2296KB active_anon:2008KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_0/task_0: cache:4268KB rss:61431392KB rss_huge:0KB mapped_file:4268KB swap:0KB inactive_anon:3920540KB active_anon:57514980KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 16:25:05 sh-101-19.int kernel: [127542] 0 127542 26988 89 9 0 0 sleep Mar 13 16:25:05 sh-101-19.int kernel: [127552] 286587 127552 28329 410 14 0 0 slurm_script Mar 13 16:25:05 sh-101-19.int kernel: [127559] 286587 127559 80960 1480 38 0 0 srun Mar 13 16:25:05 sh-101-19.int kernel: [127560] 286587 127560 13093 215 28 0 0 srun Mar 13 16:25:05 sh-101-19.int kernel: [127578] 286587 127578 219292 71635 242 0 0 julia Mar 13 16:25:05 sh-101-19.int kernel: [127585] 286587 127585 5276867 5182123 10221 0 0 julia Mar 13 16:25:05 sh-101-19.int kernel: [127586] 286587 127586 5302685 5171589 10201 0 0 julia Mar 13 16:25:05 sh-101-19.int kernel: [127587] 286587 127587 5132906 4985443 9834 0 0 julia Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 127588 (julia) score 338 or sacrifice child Mar 13 16:25:05 sh-101-19.int kernel: Killed process 127585 (julia) total-vm:21107468kB, anon-rss:20674568kB, file-rss:51848kB, shmem-rss:2076kB Mar 13 16:25:05 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 16:25:05 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 16:25:05 sh-101-19.int kernel: CPU: 5 PID: 127578 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 16:25:05 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 16:25:05 sh-101-19.int kernel: Call Trace: Mar 13 16:25:05 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 16:25:05 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 16:25:05 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 16:25:05 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 16:25:05 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 16:25:05 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 16:25:05 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 16:25:05 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 16:25:05 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 16:25:05 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 16:25:05 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 16:25:05 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 16:25:05 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 16:25:05 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39054188/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39054188 Mar 13 16:25:05 sh-101-19.int kernel: memory: usage 61440000kB, limit 61440000kB, failcnt 11547 Mar 13 16:25:05 sh-101-19.int kernel: memory+swap: usage 61440000kB, limit 61440000kB, failcnt 1 Mar 13 16:25:05 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_batch/task_0: cache:0KB rss:4304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2296KB active_anon:2008KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_0/task_0: cache:4308KB rss:59414224KB rss_huge:0KB mapped_file:4276KB swap:0KB inactive_anon:3595592KB active_anon:55778528KB inactive_file:32KB active_file:0KB unevictable:0KB Mar 13 16:25:05 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 16:25:05 sh-101-19.int kernel: [127542] 0 127542 26988 89 9 0 0 sleep Mar 13 16:25:05 sh-101-19.int kernel: [127552] 286587 127552 28329 410 14 0 0 slurm_script Mar 13 16:25:05 sh-101-19.int kernel: [127559] 286587 127559 80960 1480 38 0 0 srun Mar 13 16:25:05 sh-101-19.int kernel: [127560] 286587 127560 13093 215 28 0 0 srun Mar 13 16:25:05 sh-101-19.int kernel: [127578] 286587 127578 219292 71636 242 0 0 julia Mar 13 16:25:05 sh-101-19.int kernel: [127586] 286587 127586 5302685 5171590 10201 0 0 julia Mar 13 16:25:05 sh-101-19.int kernel: [127587] 286587 127587 5132906 4985458 9834 0 0 julia Mar 13 16:25:05 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 127599 (julia) score 337 or sacrifice child Mar 13 16:25:05 sh-101-19.int kernel: Killed process 127586 (julia) total-vm:21210740kB, anon-rss:20633304kB, file-rss:50984kB, shmem-rss:2072kB Mar 13 16:25:06 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 13 16:25:06 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 13 16:25:06 sh-101-19.int kernel: CPU: 5 PID: 127578 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 13 16:25:06 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 13 16:25:06 sh-101-19.int kernel: Call Trace: Mar 13 16:25:06 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 13 16:25:06 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 13 16:25:06 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 13 16:25:06 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 13 16:25:06 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 13 16:25:06 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 13 16:25:06 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 13 16:25:06 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 13 16:25:06 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 13 16:25:06 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 13 16:25:06 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 13 16:25:06 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 13 16:25:06 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 13 16:25:06 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39054188/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39054188 Mar 13 16:25:06 sh-101-19.int kernel: memory: usage 61440000kB, limit 61440000kB, failcnt 12021 Mar 13 16:25:06 sh-101-19.int kernel: memory+swap: usage 61440000kB, limit 61440000kB, failcnt 1 Mar 13 16:25:06 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 13 16:25:06 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:06 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:06 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:06 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:06 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_batch/task_0: cache:0KB rss:4304KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2296KB active_anon:2008KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:06 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 13 16:25:06 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39054188/step_0/task_0: cache:4308KB rss:54797996KB rss_huge:0KB mapped_file:4276KB swap:0KB inactive_anon:3569788KB active_anon:51170036KB inactive_file:32KB active_file:0KB unevictable:0KB Mar 13 16:25:06 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 13 16:25:06 sh-101-19.int kernel: [127542] 0 127542 26988 89 9 0 0 sleep Mar 13 16:25:06 sh-101-19.int kernel: [127552] 286587 127552 28329 410 14 0 0 slurm_script Mar 13 16:25:06 sh-101-19.int kernel: [127559] 286587 127559 80960 1480 38 0 0 srun Mar 13 16:25:06 sh-101-19.int kernel: [127560] 286587 127560 13093 215 28 0 0 srun Mar 13 16:25:06 sh-101-19.int kernel: [127578] 286587 127578 219292 71636 242 0 0 julia Mar 13 16:25:06 sh-101-19.int kernel: [127587] 286587 127587 5132906 4985458 9834 0 0 julia Mar 13 16:25:06 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 127603 (julia) score 325 or sacrifice child Mar 13 16:25:06 sh-101-19.int kernel: Killed process 127587 (julia) total-vm:20531624kB, anon-rss:19888848kB, file-rss:50924kB, shmem-rss:2060kB Mar 14 02:12:07 sh-101-19.int kernel: Lustre: 93480:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552554718/real 1552554718] req@ffff9c0587aad700 x1626168320045344/t0(0) o400->regal-MDT0000-mdc-ffff9bf5d468d800@10.210.34.202@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1552554725 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 14 02:12:07 sh-101-19.int kernel: Lustre: regal-MDT0000-mdc-ffff9bf5d468d800: Connection to regal-MDT0000 (at 10.210.34.202@o2ib1) was lost; in progress operations using this service will wait for recovery to complete Mar 14 02:13:28 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552554802/real 1552554802] req@ffff9bed34c3ef00 x1626168320556640/t0(0) o400->regal-MDT0000-mdc-ffff9bf5d468d800@10.210.34.201@o2ib1:12/10 lens 224/224 e 1 to 1 dl 1552554804 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Mar 14 02:13:34 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552554808/real 1552554808] req@ffff9bf2b9f13900 x1626168320593200/t0(0) o400->regal-MDT0000-mdc-ffff9bf5d468d800@10.210.34.201@o2ib1:12/10 lens 224/224 e 1 to 1 dl 1552554810 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Mar 14 02:13:40 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552554814/real 1552554814] req@ffff9be5c17bc500 x1626168320630832/t0(0) o400->regal-MDT0000-mdc-ffff9bf5d468d800@10.210.34.201@o2ib1:12/10 lens 224/224 e 1 to 1 dl 1552554816 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Mar 14 02:13:46 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552554820/real 1552554820] req@ffff9be5c17bc500 x1626168320665920/t0(0) o400->regal-MDT0000-mdc-ffff9bf5d468d800@10.210.34.201@o2ib1:12/10 lens 224/224 e 1 to 1 dl 1552554822 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Mar 14 02:13:58 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552554832/real 1552554832] req@ffff9be5c17bc500 x1626168320738864/t0(0) o400->regal-MDT0000-mdc-ffff9bf5d468d800@10.210.34.201@o2ib1:12/10 lens 224/224 e 1 to 1 dl 1552554834 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Mar 14 02:13:58 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 14 02:14:16 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552554850/real 1552554850] req@ffff9be5c17bc500 x1626168320846672/t0(0) o400->regal-MDT0000-mdc-ffff9bf5d468d800@10.210.34.201@o2ib1:12/10 lens 224/224 e 1 to 1 dl 1552554852 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Mar 14 02:14:16 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 14 02:14:52 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552554886/real 1552554886] req@ffff9bf6a2dd9200 x1626168321062208/t0(0) o400->regal-MDT0000-mdc-ffff9bf5d468d800@10.210.34.201@o2ib1:12/10 lens 224/224 e 1 to 1 dl 1552554888 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Mar 14 02:14:52 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Mar 14 02:15:25 sh-101-19.int kernel: Lustre: regal-MDT0000-mdc-ffff9bf5d468d800: Connection restored to 10.210.34.201@o2ib1 (at 10.210.34.201@o2ib1) Mar 14 02:15:59 sh-101-19.int kernel: Lustre: 93484:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552554952/real 1552554952] req@ffff9bf8c5805700 x1626168321448352/t0(0) o400->MGC10.210.34.201@o2ib1@10.210.34.201@o2ib1:26/25 lens 224/224 e 0 to 1 dl 1552554959 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 14 02:15:59 sh-101-19.int kernel: Lustre: 93484:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Mar 14 02:15:59 sh-101-19.int kernel: Lustre: regal-MDT0000-mdc-ffff9bf5d468d800: Connection to regal-MDT0000 (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will wait for recovery to complete Mar 14 02:15:59 sh-101-19.int kernel: LustreError: 166-1: MGC10.210.34.201@o2ib1: Connection to MGS (at 10.210.34.201@o2ib1) was lost; in progress operations using this service will fail Mar 14 02:21:42 sh-101-19.int kernel: LustreError: 39891:0:(file.c:4393:ll_inode_revalidate_fini()) regal: revalidate FID [0x31100001:0xc3b46127:0x0] error: rc = -4 Mar 14 02:22:40 sh-101-19.int kernel: Lustre: Evicted from MGS (at MGC10.210.34.201@o2ib1_0) after server handle changed from 0x5099a72de8d6fd7 to 0xbca71a6465af067 Mar 14 02:22:40 sh-101-19.int kernel: Lustre: MGC10.210.34.201@o2ib1: Connection restored to MGC10.210.34.201@o2ib1_0 (at 10.210.34.201@o2ib1) Mar 14 02:23:23 sh-101-19.int kernel: Lustre: 93465:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552555396/real 1552555396] req@ffff9bed34c3da00 x1626168324009552/t0(0) o400->regal-MDT0000-mdc-ffff9bf5d468d800@10.210.34.202@o2ib1:12/10 lens 224/224 e 1 to 1 dl 1552555398 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Mar 14 02:27:07 sh-101-19.int kernel: Lustre: regal-MDT0000-mdc-ffff9bf5d468d800: Connection restored to 10.210.34.202@o2ib1 (at 10.210.34.202@o2ib1) Mar 14 04:13:50 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552561999/real 1552561999] req@ffff9bebba279e00 x1626168352725472/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 0 to 1 dl 1552562030 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 14 04:13:50 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:13:50 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:13:50 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 38 previous similar messages Mar 14 04:14:21 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:14:21 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:14:52 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552562061/real 1552562061] req@ffff9bebba279e00 x1626168352728432/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 0 to 1 dl 1552562092 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 04:14:52 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:14:52 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:14:52 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 14 04:15:13 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:15:13 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:15:34 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:15:34 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:15:55 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:15:55 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:16:16 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552562155/real 1552562155] req@ffff9bf6a2ddbc00 x1626168352744784/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552562176 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 04:16:16 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:16:16 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:16:16 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Mar 14 04:16:58 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:16:58 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 14 04:16:58 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:16:58 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 14 04:18:22 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:18:22 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 14 04:18:22 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:18:22 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 14 04:18:43 sh-101-19.int kernel: Lustre: 191779:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552562302/real 1552562302] req@ffff9bff26583f00 x1626168352771760/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552562323 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 14 04:18:43 sh-101-19.int kernel: Lustre: 191779:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 35 previous similar messages Mar 14 04:20:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:20:49 sh-101-19.int kernel: Lustre: Skipped 6 previous similar messages Mar 14 04:20:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:20:49 sh-101-19.int kernel: Lustre: Skipped 6 previous similar messages Mar 14 04:23:16 sh-101-19.int kernel: Lustre: 111494:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552562575/real 1552562575] req@ffff9bf5df2b7800 x1626168352812256/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552562596 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 14 04:23:16 sh-101-19.int kernel: Lustre: 111494:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 30 previous similar messages Mar 14 04:25:22 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:25:22 sh-101-19.int kernel: Lustre: Skipped 12 previous similar messages Mar 14 04:25:22 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:25:22 sh-101-19.int kernel: Lustre: Skipped 12 previous similar messages Mar 14 04:31:48 sh-101-19.int kernel: Lustre: 192536:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552563087/real 1552563087] req@ffff9be9d0d66900 x1626168352879952/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552563108 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 04:31:48 sh-101-19.int kernel: Lustre: 192536:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 68 previous similar messages Mar 14 04:34:04 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:34:04 sh-101-19.int kernel: Lustre: Skipped 21 previous similar messages Mar 14 04:34:04 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:34:04 sh-101-19.int kernel: Lustre: Skipped 21 previous similar messages Mar 14 04:41:57 sh-101-19.int kernel: Lustre: 171882:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552563696/real 1552563696] req@ffff9c05f224d400 x1626168352976352/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552563717 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 04:41:57 sh-101-19.int kernel: Lustre: 171882:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 80 previous similar messages Mar 14 04:44:23 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:44:23 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 04:44:23 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:44:23 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 04:52:16 sh-101-19.int kernel: Lustre: 5838:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552564315/real 1552564315] req@ffff9bf9261b1200 x1626168353069584/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552564336 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 04:52:16 sh-101-19.int kernel: Lustre: 5838:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 71 previous similar messages Mar 14 04:54:43 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 04:54:43 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 04:54:43 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 04:54:43 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 05:02:32 sh-101-19.int kernel: Lustre: 45306:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552564921/real 1552564921] req@ffff9c04be280f00 x1626168353157824/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 0 to 1 dl 1552564952 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 14 05:02:32 sh-101-19.int kernel: Lustre: 45306:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 107 previous similar messages Mar 14 05:05:07 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 05:05:07 sh-101-19.int kernel: Lustre: Skipped 23 previous similar messages Mar 14 05:05:07 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 05:05:07 sh-101-19.int kernel: Lustre: Skipped 23 previous similar messages Mar 14 05:12:34 sh-101-19.int kernel: Lustre: 173650:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552565533/real 1552565533] req@ffff9c05f3b2d400 x1626168353223312/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552565554 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 05:12:34 sh-101-19.int kernel: Lustre: 173650:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 92 previous similar messages Mar 14 05:15:22 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 05:15:22 sh-101-19.int kernel: Lustre: Skipped 24 previous similar messages Mar 14 05:15:22 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 05:15:22 sh-101-19.int kernel: Lustre: Skipped 24 previous similar messages Mar 14 05:22:43 sh-101-19.int kernel: Lustre: 45305:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552566142/real 1552566142] req@ffff9be91158e900 x1626168353317264/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552566163 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 05:22:43 sh-101-19.int kernel: Lustre: 45305:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 102 previous similar messages Mar 14 05:25:50 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 05:25:50 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 05:25:50 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 05:25:50 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 05:32:55 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552566754/real 1552566754] req@ffff9bfa27042a00 x1626168353396112/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552566775 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 14 05:32:55 sh-101-19.int kernel: Lustre: 45604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 81 previous similar messages Mar 14 05:36:04 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 05:36:04 sh-101-19.int kernel: Lustre: Skipped 22 previous similar messages Mar 14 05:36:04 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 05:36:04 sh-101-19.int kernel: Lustre: Skipped 22 previous similar messages Mar 14 05:43:14 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552567373/real 1552567373] req@ffff9bff7d033600 x1626168353479424/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552567394 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 05:43:14 sh-101-19.int kernel: Lustre: 171882:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552567373/real 1552567373] req@ffff9c0583f00000 x1626168353482336/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552567394 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 05:43:14 sh-101-19.int kernel: Lustre: 171882:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 74 previous similar messages Mar 14 05:46:22 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 05:46:22 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 05:46:22 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 05:46:22 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 05:53:21 sh-101-19.int kernel: Lustre: 196401:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552567980/real 1552567980] req@ffff9bfbbd60b000 x1626168353575008/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552568001 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 14 05:53:21 sh-101-19.int kernel: Lustre: 45305:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552567980/real 1552567980] req@ffff9bebb523a700 x1626168353575232/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552568001 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 05:53:21 sh-101-19.int kernel: Lustre: 45305:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 95 previous similar messages Mar 14 05:53:21 sh-101-19.int kernel: Lustre: 196401:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 14 05:56:29 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 05:56:29 sh-101-19.int kernel: Lustre: Skipped 26 previous similar messages Mar 14 05:56:29 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 05:56:29 sh-101-19.int kernel: Lustre: Skipped 26 previous similar messages Mar 14 06:03:28 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552568587/real 1552568587] req@ffff9be91154ce00 x1626168353670432/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552568608 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 06:03:28 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 73 previous similar messages Mar 14 06:06:37 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 06:06:37 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 06:06:37 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 06:06:37 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 06:13:47 sh-101-19.int kernel: Lustre: 173648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552569196/real 1552569196] req@ffff9bfca7af9500 x1626168353752912/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 0 to 1 dl 1552569227 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 06:13:47 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552569196/real 1552569196] req@ffff9bf5d17f9500 x1626168353763808/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 0 to 1 dl 1552569227 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 06:13:47 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 82 previous similar messages Mar 14 06:16:56 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 06:16:56 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 06:16:56 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 06:16:56 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 06:23:57 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552569816/real 1552569816] req@ffff9bf81a051b00 x1626168353862544/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552569837 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 06:23:57 sh-101-19.int kernel: Lustre: 173649:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 67 previous similar messages Mar 14 06:27:04 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 06:27:04 sh-101-19.int kernel: Lustre: Skipped 26 previous similar messages Mar 14 06:27:04 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 06:27:04 sh-101-19.int kernel: Lustre: Skipped 26 previous similar messages Mar 14 06:34:16 sh-101-19.int kernel: Lustre: 5837:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552570435/real 1552570435] req@ffff9be91154fb00 x1626168353951872/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552570456 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 06:34:16 sh-101-19.int kernel: Lustre: 5837:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 93 previous similar messages Mar 14 06:37:04 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 06:37:04 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 06:37:04 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 06:37:04 sh-101-19.int kernel: Lustre: Skipped 27 previous similar messages Mar 14 06:44:22 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552571031/real 1552571031] req@ffff9c04c675b600 x1626168354025696/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 0 to 1 dl 1552571062 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 06:44:22 sh-101-19.int kernel: Lustre: 120332:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 75 previous similar messages Mar 14 06:47:21 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 06:47:21 sh-101-19.int kernel: Lustre: Skipped 26 previous similar messages Mar 14 06:47:21 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 06:47:21 sh-101-19.int kernel: Lustre: Skipped 26 previous similar messages Mar 14 06:54:43 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552571662/real 1552571662] req@ffff9bf9261b1800 x1626168354119792/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552571683 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 14 06:54:43 sh-101-19.int kernel: Lustre: 41152:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 78 previous similar messages Mar 14 06:57:31 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 06:57:31 sh-101-19.int kernel: Lustre: Skipped 26 previous similar messages Mar 14 06:57:31 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 06:57:31 sh-101-19.int kernel: Lustre: Skipped 26 previous similar messages Mar 14 07:04:52 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552572271/real 1552572271] req@ffff9bebffcf1200 x1626168354208624/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552572292 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 07:04:52 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 112 previous similar messages Mar 14 07:07:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 07:07:40 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:07:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 07:07:40 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:15:01 sh-101-19.int kernel: Lustre: 192536:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552572880/real 1552572880] req@ffff9be7a1e6a400 x1626168354306768/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552572901 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 07:15:01 sh-101-19.int kernel: Lustre: 192536:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 108 previous similar messages Mar 14 07:17:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 07:17:49 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:17:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 07:17:49 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:25:10 sh-101-19.int kernel: Lustre: 173652:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552573489/real 1552573489] req@ffff9bf1b8436900 x1626168354394112/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552573510 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 07:25:10 sh-101-19.int kernel: Lustre: 173652:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 108 previous similar messages Mar 14 07:27:58 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 07:27:58 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:27:58 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 07:27:58 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:35:19 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552574098/real 1552574098] req@ffff9c03f5fa1800 x1626168354500000/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552574119 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 14 07:35:19 sh-101-19.int kernel: Lustre: 173653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 102 previous similar messages Mar 14 07:38:07 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 07:38:07 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:38:07 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 07:38:07 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:45:28 sh-101-19.int kernel: Lustre: 191779:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552574707/real 1552574707] req@ffff9be8d917d700 x1626168354594960/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552574728 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 07:45:28 sh-101-19.int kernel: Lustre: 191779:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 118 previous similar messages Mar 14 07:48:16 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 07:48:16 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:48:16 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 07:48:16 sh-101-19.int kernel: Lustre: Skipped 28 previous similar messages Mar 14 07:55:37 sh-101-19.int kernel: Lustre: 89529:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552575316/real 1552575316] req@ffff9bf1b84bcb00 x1626168354690592/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552575337 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 14 07:55:37 sh-101-19.int kernel: Lustre: 111494:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552575316/real 1552575316] req@ffff9bf1b8434500 x1626168354687408/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 328/344 e 1 to 1 dl 1552575337 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 14 07:55:37 sh-101-19.int kernel: Lustre: 111494:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 81 previous similar messages Mar 14 07:57:12 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0001-mdc-ffff9c05b8776000: operation ldlm_enqueue to node 10.0.10.52@o2ib7 failed: rc = -19 Mar 14 07:57:12 sh-101-19.int kernel: LustreError: Skipped 9 previous similar messages Mar 14 07:58:58 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9c052623a400 x1626168352725312/t94682557199(94682557199) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1144/560 e 0 to 0 dl 1552575569 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 14 07:58:58 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 5 previous similar messages Mar 14 07:59:39 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 14 07:59:39 sh-101-19.int kernel: Lustre: Skipped 25 previous similar messages Mar 14 11:24:17 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 14 11:24:17 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 14 11:24:17 sh-101-19.int kernel: CPU: 2 PID: 35683 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 14 11:24:17 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 14 11:24:17 sh-101-19.int kernel: Call Trace: Mar 14 11:24:17 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 14 11:24:17 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 14 11:24:17 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 14 11:24:17 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 14 11:24:17 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 14 11:24:17 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 14 11:24:17 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 14 11:24:17 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 14 11:24:17 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 14 11:24:17 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 14 11:24:17 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 14 11:24:17 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 14 11:24:17 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 14 11:24:17 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39095107/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39095107 Mar 14 11:24:17 sh-101-19.int kernel: memory: usage 63123440kB, limit 65536000kB, failcnt 0 Mar 14 11:24:17 sh-101-19.int kernel: memory+swap: usage 65536000kB, limit 65536000kB, failcnt 927 Mar 14 11:24:17 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 14 11:24:17 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095107: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:24:17 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095107/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:24:17 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095107/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:24:17 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095107/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:24:17 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095107/step_batch/task_0: cache:0KB rss:1840KB rss_huge:0KB mapped_file:0KB swap:2468KB inactive_anon:940KB active_anon:900KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:24:17 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095107/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:24:17 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095107/step_0/task_0: cache:1596KB rss:63119740KB rss_huge:0KB mapped_file:496KB swap:2410116KB inactive_anon:3906900KB active_anon:59213984KB inactive_file:352KB active_file:132KB unevictable:0KB Mar 14 11:24:17 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 14 11:24:17 sh-101-19.int kernel: [35574] 0 35574 26988 88 8 0 0 sleep Mar 14 11:24:17 sh-101-19.int kernel: [35615] 286587 35615 28329 377 13 32 0 slurm_script Mar 14 11:24:17 sh-101-19.int kernel: [35635] 286587 35635 80960 948 36 459 0 srun Mar 14 11:24:17 sh-101-19.int kernel: [35665] 286587 35665 13093 3 27 212 0 srun Mar 14 11:24:17 sh-101-19.int kernel: [35683] 286587 35683 16488024 15782286 32120 603236 0 julia Mar 14 11:24:17 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 35685 (julia) score 1002 or sacrifice child Mar 14 11:24:17 sh-101-19.int kernel: Killed process 35683 (julia) total-vm:65952096kB, anon-rss:63117216kB, file-rss:11120kB, shmem-rss:804kB Mar 14 11:25:00 sh-101-19.int kernel: julia invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 14 11:25:00 sh-101-19.int kernel: julia cpuset=step_0 mems_allowed=0-1 Mar 14 11:25:00 sh-101-19.int kernel: CPU: 12 PID: 35660 Comm: julia Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 14 11:25:00 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 14 11:25:00 sh-101-19.int kernel: Call Trace: Mar 14 11:25:00 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 14 11:25:00 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 14 11:25:00 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 14 11:25:00 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 14 11:25:00 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 14 11:25:00 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 14 11:25:00 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 14 11:25:00 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 14 11:25:00 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 14 11:25:00 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 14 11:25:00 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 14 11:25:00 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 14 11:25:00 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 14 11:25:00 sh-101-19.int kernel: Task in /slurm/uid_286587/job_39095108/step_0/task_0 killed as a result of limit of /slurm/uid_286587/job_39095108 Mar 14 11:25:00 sh-101-19.int kernel: memory: usage 63985108kB, limit 65536000kB, failcnt 0 Mar 14 11:25:00 sh-101-19.int kernel: memory+swap: usage 65536000kB, limit 65536000kB, failcnt 12703 Mar 14 11:25:00 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 14 11:25:00 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095108: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:25:00 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095108/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:25:00 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095108/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:25:00 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095108/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:25:00 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095108/step_batch/task_0: cache:0KB rss:4024KB rss_huge:0KB mapped_file:0KB swap:292KB inactive_anon:2104KB active_anon:1920KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:25:00 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095108/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 11:25:00 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_286587/job_39095108/step_0/task_0: cache:2948KB rss:63978136KB rss_huge:0KB mapped_file:1212KB swap:1550600KB inactive_anon:3894596KB active_anon:60084648KB inactive_file:1064KB active_file:772KB unevictable:0KB Mar 14 11:25:00 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 14 11:25:00 sh-101-19.int kernel: [35592] 0 35592 26988 71 10 21 0 sleep Mar 14 11:25:00 sh-101-19.int kernel: [35626] 286587 35626 28329 362 13 48 0 slurm_script Mar 14 11:25:00 sh-101-19.int kernel: [35633] 286587 35633 80960 1378 37 23 0 srun Mar 14 11:25:00 sh-101-19.int kernel: [35642] 286587 35642 13093 214 27 2 0 srun Mar 14 11:25:00 sh-101-19.int kernel: [35660] 286587 35660 16488373 15997429 32119 388373 0 julia Mar 14 11:25:00 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 35662 (julia) score 1002 or sacrifice child Mar 14 11:25:00 sh-101-19.int kernel: Killed process 35660 (julia) total-vm:65953492kB, anon-rss:63977532kB, file-rss:10096kB, shmem-rss:2088kB Mar 14 12:26:45 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 14 12:26:45 sh-101-19.int kernel: Lustre: Skipped 32 previous similar messages Mar 14 12:27:35 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0000_UUID went back in time (transno 99043070656 was previously committed, server now claims 85900198291)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 14 12:27:35 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) Skipped 1 previous similar message Mar 14 12:28:41 sh-101-19.int kernel: LustreError: 93465:0:(import.c:1247:ptlrpc_connect_interpret()) fir-MDT0003_UUID went back in time (transno 99237731920 was previously committed, server now claims 68840102079)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 Mar 14 12:29:20 sh-101-19.int kernel: Lustre: fir-MDT0000-mdc-ffff9c05b8776000: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Mar 14 12:29:20 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 14 12:31:02 sh-101-19.int kernel: LustreError: 89503:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 14 12:31:02 sh-101-19.int kernel: LustreError: 89503:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 14 12:32:40 sh-101-19.int kernel: Lustre: fir-MDT0003-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 14 12:32:40 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 14 13:34:37 sh-101-19.int kernel: R invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 14 13:34:37 sh-101-19.int kernel: R cpuset=step_batch mems_allowed=0-1 Mar 14 13:34:37 sh-101-19.int kernel: CPU: 14 PID: 95815 Comm: R Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 14 13:34:37 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 14 13:34:37 sh-101-19.int kernel: Call Trace: Mar 14 13:34:37 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 14 13:34:37 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 14 13:34:37 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 14 13:34:37 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 14 13:34:37 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 14 13:34:37 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 14 13:34:37 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 14 13:34:37 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 14 13:34:37 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 14 13:34:37 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 14 13:34:37 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 14 13:34:37 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 14 13:34:37 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 14 13:34:37 sh-101-19.int kernel: Task in /slurm/uid_272736/job_39118725/step_batch/task_0 killed as a result of limit of /slurm/uid_272736/job_39118725/step_batch Mar 14 13:34:37 sh-101-19.int kernel: memory: usage 83886080kB, limit 83886080kB, failcnt 1627442 Mar 14 13:34:37 sh-101-19.int kernel: memory+swap: usage 83886080kB, limit 83886080kB, failcnt 0 Mar 14 13:34:37 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 14 13:34:37 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_272736/job_39118725/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 14 13:34:37 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_272736/job_39118725/step_batch/task_0: cache:92KB rss:83885988KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:4422656KB active_anon:79463280KB inactive_file:92KB active_file:0KB unevictable:0KB Mar 14 13:34:37 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 14 13:34:37 sh-101-19.int kernel: [95427] 272736 95427 28362 454 14 0 0 slurm_script Mar 14 13:34:37 sh-101-19.int kernel: [95456] 272736 95456 3681761 3396356 6761 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95615] 272736 95615 2750735 2481055 4942 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95643] 272736 95643 2750735 2481054 4941 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95672] 272736 95672 2750735 2481055 4942 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95700] 272736 95700 2750735 2481054 4942 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95729] 272736 95729 2750735 2481054 4938 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95757] 272736 95757 2750735 2481055 4938 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95786] 272736 95786 2743601 2479753 4929 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95815] 272736 95815 401793 103809 280 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95845] 272736 95845 269147 10014 95 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95874] 272736 95874 269148 10016 92 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95903] 272736 95903 269148 10015 95 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95932] 272736 95932 269148 10015 95 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95961] 272736 95961 269148 10015 95 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [95990] 272736 95990 269149 10015 96 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [96020] 272736 96020 269147 10015 93 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [96048] 272736 96048 269148 10015 97 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [96077] 272736 96077 269148 10015 94 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [96105] 272736 96105 269149 10015 94 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [96134] 272736 96134 269148 10014 95 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: [96163] 272736 96163 269147 10014 96 0 0 R Mar 14 13:34:37 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 95482 (R) score 162 or sacrifice child Mar 14 13:34:37 sh-101-19.int kernel: Killed process 95456 (R) total-vm:14727044kB, anon-rss:13582156kB, file-rss:3268kB, shmem-rss:0kB Mar 15 18:29:06 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552699715/real 1552699715] req@ffff9c03ea826300 x1626168701359216/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 544/1752 e 0 to 1 dl 1552699746 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 15 18:29:06 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 23 previous similar messages Mar 15 18:29:06 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 15 18:29:06 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 15 18:29:06 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 15 18:29:37 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 15 18:29:37 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 15 18:30:08 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 15 18:30:39 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552699808/real 1552699808] req@ffff9c03ea826300 x1626168701359216/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 544/1752 e 0 to 1 dl 1552699839 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 15 18:30:39 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 15 18:30:39 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 15 18:30:40 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 15 18:31:11 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 15 18:31:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 15 18:32:13 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 15 18:32:13 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 15 18:32:44 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 15 18:32:44 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 15 18:33:15 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552699964/real 1552699964] req@ffff9c03ea826300 x1626168701359216/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 544/1752 e 0 to 1 dl 1552699995 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 15 18:33:15 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 15 18:34:48 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 15 18:34:48 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 15 18:35:19 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 15 18:35:19 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 15 18:38:25 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552700274/real 1552700274] req@ffff9c03ea826300 x1626168701359216/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 544/1752 e 0 to 1 dl 1552700305 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 15 18:38:25 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Mar 15 18:39:58 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 15 18:39:58 sh-101-19.int kernel: Lustre: Skipped 9 previous similar messages Mar 15 18:40:29 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 15 18:40:29 sh-101-19.int kernel: Lustre: Skipped 9 previous similar messages Mar 15 18:48:45 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552700894/real 1552700894] req@ffff9c03ea826300 x1626168701359216/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 544/1752 e 0 to 1 dl 1552700925 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 15 18:48:45 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 19 previous similar messages Mar 15 18:50:18 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 15 18:50:18 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 15 18:50:49 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 15 18:50:49 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 15 18:59:05 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552701514/real 1552701514] req@ffff9c03ea826300 x1626168701359216/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 544/1752 e 0 to 1 dl 1552701545 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 15 18:59:05 sh-101-19.int kernel: Lustre: 163836:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 19 previous similar messages Mar 15 19:00:38 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 15 19:00:38 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 15 19:01:09 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 15 19:01:09 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 15 19:02:52 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0001-mdc-ffff9c05b8776000: operation mds_reint to node 10.0.10.52@o2ib7 failed: rc = -19 Mar 15 19:06:54 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9be88cd00000 x1626168686779776/t103496252801(103496252801) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1160/560 e 0 to 0 dl 1552702045 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 15 19:07:15 sh-101-19.int kernel: LustreError: 46919:0:(file.c:4393:ll_inode_revalidate_fini()) fir: revalidate FID [0x200000007:0x1:0x0] error: rc = -4 Mar 17 08:59:41 sh-101-19.int kernel: Lustre: fir-MDT0002-mdc-ffff9c05b8776000: Connection to fir-MDT0002 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 17 08:59:41 sh-101-19.int kernel: Lustre: Skipped 8 previous similar messages Mar 17 08:59:48 sh-101-19.int kernel: Lustre: 93467:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552838381/real 1552838381] req@ffff9becae211200 x1626169395486112/t0(0) o400->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1552838388 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 17 08:59:48 sh-101-19.int kernel: Lustre: 93467:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Mar 17 09:01:10 sh-101-19.int kernel: Lustre: 93475:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1552838463/real 1552838463] req@ffff9bf230300900 x1626169395494416/t0(0) o400->MGC10.0.10.51@o2ib7@10.0.10.51@o2ib7:26/25 lens 224/224 e 0 to 1 dl 1552838470 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 17 09:01:10 sh-101-19.int kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Mar 17 09:01:35 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff9bf877e19e00 x1626168894474432/t107556607954(107556607954) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 1768/560 e 0 to 0 dl 1552838502 ref 2 fl Interpret:RP/4/0 rc 301/301 Mar 17 09:01:35 sh-101-19.int kernel: LustreError: 93465:0:(client.c:3023:ptlrpc_replay_interpret()) Skipped 3 previous similar messages Mar 17 09:02:00 sh-101-19.int kernel: Lustre: Evicted from MGS (at MGC10.0.10.51@o2ib7_0) after server handle changed from 0x974d7e52602357 to 0xe9a38a32063647cc Mar 17 09:02:00 sh-101-19.int kernel: Lustre: MGC10.0.10.51@o2ib7: Connection restored to MGC10.0.10.51@o2ib7_0 (at 10.0.10.51@o2ib7) Mar 17 09:02:00 sh-101-19.int kernel: Lustre: Skipped 8 previous similar messages Mar 17 09:02:06 sh-101-19.int kernel: LustreError: 108735:0:(lmv_obd.c:1412:lmv_statfs()) can't stat MDS #0 (fir-MDT0003-mdc-ffff9c05b8776000), error -4 Mar 17 09:02:06 sh-101-19.int kernel: LustreError: 108735:0:(llite_lib.c:1807:ll_statfs_internal()) md_statfs fails: rc = -4 Mar 19 21:12:30 sh-101-19.int kernel: Lustre: 168607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553055129/real 1553055129] req@ffff9beb055c0f00 x1626169432712448/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1553055150 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 19 21:12:30 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 19 21:12:30 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 19 21:12:30 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 19 21:12:30 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 19 21:12:51 sh-101-19.int kernel: Lustre: 168607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553055150/real 1553055150] req@ffff9beb055c0f00 x1626169432712448/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1553055171 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 19 21:12:51 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 19 21:12:51 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 19 21:13:12 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 19 21:13:12 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 19 21:13:33 sh-101-19.int kernel: Lustre: 168607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553055192/real 1553055192] req@ffff9beb055c0f00 x1626169432712448/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1553055213 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 19 21:13:33 sh-101-19.int kernel: Lustre: 168607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 19 21:13:54 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 19 21:13:54 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 19 21:13:54 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 19 21:13:54 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 19 21:14:57 sh-101-19.int kernel: Lustre: 168607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553055276/real 1553055276] req@ffff9beb055c0f00 x1626169432712448/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 1 to 1 dl 1553055297 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 19 21:14:57 sh-101-19.int kernel: Lustre: 168607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 19 21:15:18 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 19 21:15:18 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 19 21:15:18 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 19 21:15:18 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 19 21:17:57 sh-101-19.int kernel: Lustre: 168605:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553055446/real 1553055446] req@ffff9be925ed6c00 x1626169432791136/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1553055477 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 19 21:17:57 sh-101-19.int kernel: Lustre: 168605:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 19 21:17:57 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 19 21:17:57 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 19 21:17:57 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 19 21:17:57 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 19 21:22:58 sh-101-19.int kernel: Lustre: 171221:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553055747/real 1553055747] req@ffff9be78dfb5a00 x1626169432819536/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 568/3376 e 0 to 1 dl 1553055778 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 19 21:22:58 sh-101-19.int kernel: Lustre: 171221:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Mar 19 21:22:58 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 19 21:22:58 sh-101-19.int kernel: Lustre: Skipped 7 previous similar messages Mar 19 21:22:58 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 19 21:22:58 sh-101-19.int kernel: Lustre: Skipped 7 previous similar messages Mar 19 21:23:29 sh-101-19.int kernel: LustreError: 167-0: fir-MDT0001-mdc-ffff9c05b8776000: This client was evicted by fir-MDT0001; in progress operations using this service will fail. Mar 19 21:23:29 sh-101-19.int kernel: LustreError: 168605:0:(file.c:4393:ll_inode_revalidate_fini()) fir: revalidate FID [0x24000cdb5:0x1172f:0x0] error: rc = -108 Mar 19 23:07:57 sh-101-19.int kernel: python invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Mar 19 23:07:57 sh-101-19.int kernel: python cpuset=step_0 mems_allowed=0-1 Mar 19 23:07:57 sh-101-19.int kernel: CPU: 13 PID: 15844 Comm: python Kdump: loaded Tainted: G OE ------------ T 3.10.0-957.5.1.el7.x86_64 #1 Mar 19 23:07:57 sh-101-19.int kernel: Hardware name: Dell Inc. PowerEdge C6320/082F9M, BIOS 2.8.0 05/28/2018 Mar 19 23:07:57 sh-101-19.int kernel: Call Trace: Mar 19 23:07:57 sh-101-19.int kernel: [] dump_stack+0x19/0x1b Mar 19 23:07:57 sh-101-19.int kernel: [] dump_header+0x90/0x229 Mar 19 23:07:57 sh-101-19.int kernel: [] ? default_wake_function+0x12/0x20 Mar 19 23:07:57 sh-101-19.int kernel: [] ? find_lock_task_mm+0x56/0xc0 Mar 19 23:07:57 sh-101-19.int kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Mar 19 23:07:57 sh-101-19.int kernel: [] oom_kill_process+0x254/0x3d0 Mar 19 23:07:57 sh-101-19.int kernel: [] mem_cgroup_oom_synchronize+0x546/0x570 Mar 19 23:07:57 sh-101-19.int kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Mar 19 23:07:57 sh-101-19.int kernel: [] pagefault_out_of_memory+0x14/0x90 Mar 19 23:07:57 sh-101-19.int kernel: [] mm_fault_error+0x6a/0x157 Mar 19 23:07:57 sh-101-19.int kernel: [] __do_page_fault+0x3c8/0x500 Mar 19 23:07:57 sh-101-19.int kernel: [] do_page_fault+0x35/0x90 Mar 19 23:07:57 sh-101-19.int kernel: [] page_fault+0x28/0x30 Mar 19 23:07:58 sh-101-19.int kernel: Task in /slurm/uid_30356/job_39376896/step_0/task_0 killed as a result of limit of /slurm/uid_30356/job_39376896 Mar 19 23:07:58 sh-101-19.int kernel: memory: usage 15728640kB, limit 15728640kB, failcnt 11994 Mar 19 23:07:58 sh-101-19.int kernel: memory+swap: usage 15728640kB, limit 15728640kB, failcnt 0 Mar 19 23:07:58 sh-101-19.int kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Mar 19 23:07:58 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_39376896: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 19 23:07:58 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_39376896/step_extern: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 19 23:07:58 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_39376896/step_extern/task_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 19 23:07:58 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_39376896/step_batch: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 19 23:07:58 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_39376896/step_batch/task_0: cache:0KB rss:4508KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:2304KB active_anon:2204KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 19 23:07:58 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_39376896/step_0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 19 23:07:58 sh-101-19.int kernel: Memory cgroup stats for /slurm/uid_30356/job_39376896/step_0/task_0: cache:0KB rss:15724132KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:569344KB active_anon:15154788KB inactive_file:0KB active_file:0KB unevictable:0KB Mar 19 23:07:58 sh-101-19.int kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 19 23:07:58 sh-101-19.int kernel: [15768] 0 15768 26988 88 10 0 0 sleep Mar 19 23:07:58 sh-101-19.int kernel: [15802] 30356 15802 28335 445 13 0 0 slurm_script Mar 19 23:07:58 sh-101-19.int kernel: [15820] 30356 15820 80951 1489 39 0 0 srun Mar 19 23:07:58 sh-101-19.int kernel: [15826] 30356 15826 13100 218 30 0 0 srun Mar 19 23:07:58 sh-101-19.int kernel: [15844] 30356 15844 5720573 3934159 7808 0 0 python Mar 19 23:07:58 sh-101-19.int kernel: Memory cgroup out of memory: Kill process 15844 (python) score 1002 or sacrifice child Mar 19 23:07:58 sh-101-19.int kernel: Killed process 15844 (python) total-vm:22882292kB, anon-rss:15724052kB, file-rss:12584kB, shmem-rss:0kB Mar 20 00:21:45 sh-101-19.int kernel: Lustre: 170148:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553066474/real 1553066474] req@ffff9bea46a56600 x1626169440403344/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1553066505 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 20 00:21:45 sh-101-19.int kernel: Lustre: 170148:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 20 00:21:45 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 00:21:45 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 20 00:21:45 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 00:21:45 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 20 00:23:18 sh-101-19.int kernel: Lustre: 170148:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553066567/real 1553066567] req@ffff9bea46a56600 x1626169440403344/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1553066598 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 00:23:18 sh-101-19.int kernel: Lustre: 170148:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 20 00:23:18 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 00:23:18 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 20 00:23:18 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 00:23:18 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 20 00:26:17 sh-101-19.int kernel: Lustre: 23049:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553066746/real 1553066746] req@ffff9c0503468c00 x1626169440576960/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1553066777 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 00:26:17 sh-101-19.int kernel: Lustre: 23049:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 20 00:26:17 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 00:26:17 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 00:26:17 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 00:26:17 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 00:31:47 sh-101-19.int kernel: Lustre: 171221:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553067076/real 1553067076] req@ffff9bf081211e00 x1626169440741344/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 568/3376 e 0 to 1 dl 1553067107 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 00:31:47 sh-101-19.int kernel: Lustre: 171221:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Mar 20 00:31:47 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 00:31:47 sh-101-19.int kernel: Lustre: Skipped 7 previous similar messages Mar 20 00:31:47 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 00:31:47 sh-101-19.int kernel: Lustre: Skipped 7 previous similar messages Mar 20 00:32:18 sh-101-19.int kernel: LustreError: 167-0: fir-MDT0001-mdc-ffff9c05b8776000: This client was evicted by fir-MDT0001; in progress operations using this service will fail. Mar 20 00:32:18 sh-101-19.int kernel: LustreError: 171221:0:(file.c:216:ll_close_inode_openhandle()) fir-clilmv-ffff9c05b8776000: inode [0x24000cdb5:0x11760:0x0] mdc close failed: rc = -108 Mar 20 00:32:18 sh-101-19.int kernel: LustreError: 24865:0:(ldlm_resource.c:1146:ldlm_resource_complain()) fir-MDT0001-mdc-ffff9c05b8776000: namespace resource [0x240000402:0x5:0x0].0x0 (ffff9bfeb45b06c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Mar 20 00:32:18 sh-101-19.int kernel: LustreError: 171221:0:(file.c:216:ll_close_inode_openhandle()) Skipped 1 previous similar message Mar 20 02:20:11 sh-101-19.int kernel: Lustre: 168605:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553073580/real 1553073580] req@ffff9bf5d46de900 x1626169449554384/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1553073611 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 20 02:20:11 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 02:20:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 20 02:20:11 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 02:20:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 20 02:20:11 sh-101-19.int kernel: Lustre: 168605:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 20 02:21:44 sh-101-19.int kernel: Lustre: 168605:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553073673/real 1553073673] req@ffff9bf5d46de900 x1626169449554384/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1553073704 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 02:21:44 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 02:21:44 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 20 02:21:44 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 02:21:44 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 20 02:21:44 sh-101-19.int kernel: Lustre: 168605:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 20 02:24:45 sh-101-19.int kernel: Lustre: 170148:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553073854/real 1553073854] req@ffff9bf081216c00 x1626169449678352/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1553073885 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 02:24:45 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 02:24:45 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 02:24:45 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 02:24:45 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 02:24:45 sh-101-19.int kernel: Lustre: 170148:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Mar 20 05:16:29 sh-101-19.int kernel: Lustre: 171167:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553084168/real 1553084168] req@ffff9bfa6964e300 x1626169474889360/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 1 to 1 dl 1553084189 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 20 05:16:29 sh-101-19.int kernel: Lustre: 171167:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Mar 20 05:16:29 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 05:16:29 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 05:16:29 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 05:16:29 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 05:17:11 sh-101-19.int kernel: Lustre: 171167:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553084210/real 1553084210] req@ffff9bfa6964e300 x1626169474889360/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 1 to 1 dl 1553084231 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 05:17:11 sh-101-19.int kernel: Lustre: 171167:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 20 05:17:11 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 05:17:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 20 05:17:11 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 05:17:11 sh-101-19.int kernel: Lustre: Skipped 1 previous similar message Mar 20 05:18:35 sh-101-19.int kernel: Lustre: 171167:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553084294/real 1553084294] req@ffff9bfa6964e300 x1626169474889360/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 576/2088 e 1 to 1 dl 1553084315 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 05:18:35 sh-101-19.int kernel: Lustre: 171167:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 20 05:18:35 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 05:18:35 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 20 05:18:35 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 05:18:35 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 20 05:21:38 sh-101-19.int kernel: Lustre: 171167:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553084467/real 1553084467] req@ffff9bf4080a5700 x1626169474942800/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1553084498 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 20 05:21:38 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 05:21:38 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 05:21:38 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 05:21:38 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 05:21:38 sh-101-19.int kernel: Lustre: 171167:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Mar 20 05:26:40 sh-101-19.int kernel: Lustre: 170409:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553084769/real 1553084769] req@ffff9c05ea645d00 x1626169474986992/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 480/568 e 0 to 1 dl 1553084800 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 05:26:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 05:26:40 sh-101-19.int kernel: Lustre: Skipped 7 previous similar messages Mar 20 05:26:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 05:26:40 sh-101-19.int kernel: Lustre: Skipped 7 previous similar messages Mar 20 05:26:40 sh-101-19.int kernel: Lustre: 170409:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 19 previous similar messages Mar 20 07:30:13 sh-101-19.int kernel: Lustre: 171171:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553092192/real 1553092192] req@ffff9bf92ff82700 x1626169496433792/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 1 to 1 dl 1553092213 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 20 07:30:13 sh-101-19.int kernel: Lustre: 171171:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 20 07:30:13 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 07:30:13 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 20 07:30:13 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 07:30:13 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 20 07:31:55 sh-101-19.int kernel: Lustre: 31169:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553092284/real 1553092284] req@ffff9bf965e4ec00 x1626169496436816/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1553092315 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 07:31:55 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 07:31:55 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 20 07:31:55 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 07:31:55 sh-101-19.int kernel: Lustre: Skipped 2 previous similar messages Mar 20 07:31:55 sh-101-19.int kernel: Lustre: 31169:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 20 07:34:30 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553092439/real 1553092439] req@ffff9bfc9c1aa400 x1626169496436768/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 0 to 1 dl 1553092470 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 07:34:30 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 07:34:30 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 07:34:30 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 07:34:30 sh-101-19.int kernel: Lustre: Skipped 4 previous similar messages Mar 20 07:34:30 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Mar 20 07:39:40 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553092749/real 1553092749] req@ffff9bfc9c1aa400 x1626169496436768/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 0 to 1 dl 1553092780 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 07:39:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 07:39:40 sh-101-19.int kernel: Lustre: Skipped 9 previous similar messages Mar 20 07:39:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 07:39:40 sh-101-19.int kernel: Lustre: Skipped 9 previous similar messages Mar 20 07:39:40 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages Mar 20 07:50:00 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553093369/real 1553093369] req@ffff9bfc9c1aa400 x1626169496436768/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 0 to 1 dl 1553093400 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 07:50:00 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 07:50:00 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 07:50:00 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 07:50:00 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 07:50:00 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 36 previous similar messages Mar 20 08:00:20 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553093989/real 1553093989] req@ffff9bfc9c1aa400 x1626169496436768/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 0 to 1 dl 1553094020 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 08:00:20 sh-101-19.int kernel: Lustre: 31169:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553093989/real 1553093989] req@ffff9bf965e4ec00 x1626169496436816/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1553094020 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 08:00:20 sh-101-19.int kernel: Lustre: 31169:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 35 previous similar messages Mar 20 08:00:20 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 08:00:20 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 08:00:20 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 08:00:20 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 08:10:40 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553094609/real 1553094609] req@ffff9bfc9c1aa400 x1626169496436768/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 0 to 1 dl 1553094640 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 08:10:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 08:10:40 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 08:10:40 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 08:10:40 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 08:10:40 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 37 previous similar messages Mar 20 08:21:00 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553095229/real 1553095229] req@ffff9bfc9c1aa400 x1626169496436768/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 0 to 1 dl 1553095260 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 08:21:00 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 08:21:00 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 08:21:00 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 08:21:00 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 08:21:00 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 35 previous similar messages Mar 20 08:31:20 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553095849/real 1553095849] req@ffff9bfc9c1aa400 x1626169496436768/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 0 to 1 dl 1553095880 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 08:31:20 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 08:31:20 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 08:31:20 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 08:31:20 sh-101-19.int kernel: Lustre: Skipped 19 previous similar messages Mar 20 08:31:20 sh-101-19.int kernel: Lustre: 170149:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 35 previous similar messages Mar 20 08:55:34 sh-101-19.int kernel: Lustre: 171221:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553097313/real 1553097313] req@ffff9bf2a8369200 x1626169503512896/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 1 to 1 dl 1553097334 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 Mar 20 08:55:34 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 08:55:34 sh-101-19.int kernel: Lustre: Skipped 7 previous similar messages Mar 20 08:55:34 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 08:55:34 sh-101-19.int kernel: Lustre: Skipped 7 previous similar messages Mar 20 08:55:34 sh-101-19.int kernel: Lustre: 171221:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Mar 20 08:56:58 sh-101-19.int kernel: Lustre: 171221:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553097397/real 1553097397] req@ffff9bf2a8369200 x1626169503512896/t0(0) o36->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 488/4528 e 1 to 1 dl 1553097418 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 08:56:58 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 08:56:58 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 20 08:56:58 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 08:56:58 sh-101-19.int kernel: Lustre: Skipped 3 previous similar messages Mar 20 08:56:58 sh-101-19.int kernel: Lustre: 171221:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages Mar 20 08:57:42 sh-101-19.int kernel: LustreError: 11-0: fir-MDT0001-mdc-ffff9c05b8776000: operation ldlm_enqueue to node 10.0.10.52@o2ib7 failed: rc = -107 Mar 20 08:57:42 sh-101-19.int kernel: LustreError: 167-0: fir-MDT0001-mdc-ffff9c05b8776000: This client was evicted by fir-MDT0001; in progress operations using this service will fail. Mar 20 08:57:42 sh-101-19.int kernel: LustreError: Skipped 1 previous similar message Mar 20 08:57:42 sh-101-19.int kernel: LustreError: 170148:0:(file.c:4393:ll_inode_revalidate_fini()) fir: revalidate FID [0x24000ed7c:0xdd:0x0] error: rc = -107 Mar 20 08:59:46 sh-101-19.int kernel: Lustre: 25056:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553097555/real 1553097555] req@ffff9be78dfb0300 x1626169503655152/t0(0) o101->fir-MDT0001-mdc-ffff9c05b8776000@10.0.10.52@o2ib7:12/10 lens 600/2088 e 0 to 1 dl 1553097586 ref 2 fl Rpc:X/2/ffffffff rc -11/-1 Mar 20 08:59:46 sh-101-19.int kernel: Lustre: 25056:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Mar 20 08:59:46 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection to fir-MDT0001 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Mar 20 08:59:46 sh-101-19.int kernel: Lustre: Skipped 6 previous similar messages Mar 20 08:59:46 sh-101-19.int kernel: Lustre: fir-MDT0001-mdc-ffff9c05b8776000: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) Mar 20 08:59:46 sh-101-19.int kernel: Lustre: Skipped 6 previous similar messages