[ 0.000000] SRAT: Node 0 PXM 0 [mem 0x00100000-0x7fffffff] [ 0.000000] SRAT: Node 0 PXM 0 [mem 0x100000000-0x107fffffff] [ 0.000000] SRAT: Node 1 PXM 1 [mem 0x1080000000-0x207fffffff] [ 0.000000] SRAT: Node 2 PXM 2 [mem 0x2080000000-0x307fffffff] [ 0.000000] SRAT: Node 3 PXM 3 [mem 0x3080000000-0x407fffffff] [ 0.000000] NUMA: Initialized distance table, cnt=4 [ 0.000000] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0x7fffffff] -> [mem 0x00000000-0x7fffffff] [ 0.000000] NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 0x100000000-0x107fffffff] -> [mem 0x00000000-0x107fffffff] [ 0.000000] NODE_DATA(0) allocated [mem 0x107f359000-0x107f37ffff] [ 0.000000] NODE_DATA(1) allocated [mem 0x207ff59000-0x207ff7ffff] [ 0.000000] NODE_DATA(2) allocated [mem 0x307ff59000-0x307ff7ffff] [ 0.000000] NODE_DATA(3) allocated [mem 0x407ff58000-0x407ff7efff] [ 0.000000] Reserving 176MB of memory at 704MB for crashkernel (System RAM: 261692MB) [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x00001000-0x00ffffff] [ 0.000000] DMA32 [mem 0x01000000-0xffffffff] [ 0.000000] Normal [mem 0x100000000-0x407ff7ffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x0008efff] [ 0.000000] node 0: [mem 0x00090000-0x0009ffff] [ 0.000000] node 0: [mem 0x00100000-0x4f780fff] [ 0.000000] node 0: [mem 0x5778a000-0x6cacefff] [ 0.000000] node 0: [mem 0x6ffff000-0x6fffffff] [ 0.000000] node 0: [mem 0x100000000-0x107f37ffff] [ 0.000000] node 1: [mem 0x1080000000-0x207ff7ffff] [ 0.000000] node 2: [mem 0x2080000000-0x307ff7ffff] [ 0.000000] node 3: [mem 0x3080000000-0x407ff7ffff] [ 0.000000] Initmem setup node 0 [mem 0x00001000-0x107f37ffff] [ 0.000000] On node 0 totalpages: 16661989 [ 0.000000] DMA zone: 64 pages used for memmap [ 0.000000] DMA zone: 1126 pages reserved [ 0.000000] DMA zone: 3998 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 6380 pages used for memmap [ 0.000000] DMA32 zone: 408263 pages, LIFO batch:31 [ 0.000000] Normal zone: 253902 pages used for memmap [ 0.000000] Normal zone: 16249728 pages, LIFO batch:31 [ 0.000000] Initmem setup node 1 [mem 0x1080000000-0x207ff7ffff] [ 0.000000] On node 1 totalpages: 16777088 [ 0.000000] Normal zone: 262142 pages used for memmap [ 0.000000] Normal zone: 16777088 pages, LIFO batch:31 [ 0.000000] Initmem setup node 2 [mem 0x2080000000-0x307ff7ffff] [ 0.000000] On node 2 totalpages: 16777088 [ 0.000000] Normal zone: 262142 pages used for memmap [ 0.000000] Normal zone: 16777088 pages, LIFO batch:31 [ 0.000000] Initmem setup node 3 [mem 0x3080000000-0x407ff7ffff] [ 0.000000] On node 3 totalpages: 16777088 [ 0.000000] Normal zone: 262142 pages used for memmap [ 0.000000] Normal zone: 16777088 pages, LIFO batch:31 [ 0.000000] ACPI: PM-Timer IO Port: 0x408 [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x20] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x30] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x08] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x18] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x28] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x38] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x02] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x12] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x22] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x32] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0a] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x1a] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x2a] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x3a] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x04] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x11] lapic_id[0x14] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x12] lapic_id[0x24] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x13] lapic_id[0x34] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x14] lapic_id[0x0c] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x15] lapic_id[0x1c] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x16] lapic_id[0x2c] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x17] lapic_id[0x3c] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x18] lapic_id[0x01] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x19] lapic_id[0x11] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x21] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x31] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x09] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x19] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x29] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x39] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x20] lapic_id[0x03] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x21] lapic_id[0x13] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x22] lapic_id[0x23] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x23] lapic_id[0x33] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x24] lapic_id[0x0b] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x25] lapic_id[0x1b] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x26] lapic_id[0x2b] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x27] lapic_id[0x3b] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x28] lapic_id[0x05] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x29] lapic_id[0x15] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x2a] lapic_id[0x25] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x2b] lapic_id[0x35] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x2c] lapic_id[0x0d] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x2d] lapic_id[0x1d] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x2e] lapic_id[0x2d] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x2f] lapic_id[0x3d] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x30] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x31] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x32] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x33] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x34] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x35] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x36] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x37] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x38] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x39] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x3a] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x3b] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x3c] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x3d] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x3e] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x3f] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x40] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x41] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x42] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x43] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x44] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x45] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x46] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x47] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x48] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x49] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x4a] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x4b] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x4c] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x4d] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x4e] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x4f] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x50] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x51] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x52] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x53] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x54] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x55] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x56] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x57] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x58] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x59] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x5a] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x5b] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x5c] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x5d] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x5e] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x5f] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x60] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x61] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x62] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x63] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x64] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x65] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x66] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x67] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x68] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x69] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x6a] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x6b] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x6c] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x6d] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x6e] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x6f] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x70] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x71] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x72] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x73] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x74] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x75] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x76] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x77] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x78] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x79] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x7a] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x7b] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x7c] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x7d] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x7e] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x7f] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1]) [ 0.000000] ACPI: IOAPIC (id[0x80] address[0xfec00000] gsi_base[0]) [ 0.000000] IOAPIC[0]: apic_id 128, version 33, address 0xfec00000, GSI 0-23 [ 0.000000] ACPI: IOAPIC (id[0x81] address[0xfd880000] gsi_base[24]) [ 0.000000] IOAPIC[1]: apic_id 129, version 33, address 0xfd880000, GSI 24-55 [ 0.000000] ACPI: IOAPIC (id[0x82] address[0xe0900000] gsi_base[56]) [ 0.000000] IOAPIC[2]: apic_id 130, version 33, address 0xe0900000, GSI 56-87 [ 0.000000] ACPI: IOAPIC (id[0x83] address[0xc5900000] gsi_base[88]) [ 0.000000] IOAPIC[3]: apic_id 131, version 33, address 0xc5900000, GSI 88-119 [ 0.000000] ACPI: IOAPIC (id[0x84] address[0xaa900000] gsi_base[120]) [ 0.000000] IOAPIC[4]: apic_id 132, version 33, address 0xaa900000, GSI 120-151 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.000000] ACPI: IRQ0 used by override. [ 0.000000] ACPI: IRQ9 used by override. [ 0.000000] Using ACPI (MADT) for SMP configuration information [ 0.000000] ACPI: HPET id: 0x10228201 base: 0xfed00000 [ 0.000000] smpboot: Allowing 128 CPUs, 80 hotplug CPUs [ 0.000000] PM: Registered nosave memory: [mem 0x0008f000-0x0008ffff] [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] [ 0.000000] PM: Registered nosave memory: [mem 0x3788e000-0x3788efff] [ 0.000000] PM: Registered nosave memory: [mem 0x378a6000-0x378a6fff] [ 0.000000] PM: Registered nosave memory: [mem 0x378a7000-0x378a7fff] [ 0.000000] PM: Registered nosave memory: [mem 0x378cc000-0x378ccfff] [ 0.000000] PM: Registered nosave memory: [mem 0x378cd000-0x378cdfff] [ 0.000000] PM: Registered nosave memory: [mem 0x378d5000-0x378d5fff] [ 0.000000] PM: Registered nosave memory: [mem 0x378d6000-0x378d6fff] [ 0.000000] PM: Registered nosave memory: [mem 0x37907000-0x37907fff] [ 0.000000] PM: Registered nosave memory: [mem 0x37908000-0x37908fff] [ 0.000000] PM: Registered nosave memory: [mem 0x37939000-0x37939fff] [ 0.000000] PM: Registered nosave memory: [mem 0x3793a000-0x3793afff] [ 0.000000] PM: Registered nosave memory: [mem 0x379db000-0x379dbfff] [ 0.000000] PM: Registered nosave memory: [mem 0x4f781000-0x57789fff] [ 0.000000] PM: Registered nosave memory: [mem 0x6cacf000-0x6efcefff] [ 0.000000] PM: Registered nosave memory: [mem 0x6efcf000-0x6fdfefff] [ 0.000000] PM: Registered nosave memory: [mem 0x6fdff000-0x6fffefff] [ 0.000000] PM: Registered nosave memory: [mem 0x70000000-0x8fffffff] [ 0.000000] PM: Registered nosave memory: [mem 0x90000000-0xfec0ffff] [ 0.000000] PM: Registered nosave memory: [mem 0xfec10000-0xfec10fff] [ 0.000000] PM: Registered nosave memory: [mem 0xfec11000-0xfed7ffff] [ 0.000000] PM: Registered nosave memory: [mem 0xfed80000-0xfed80fff] [ 0.000000] PM: Registered nosave memory: [mem 0xfed81000-0xffffffff] [ 0.000000] PM: Registered nosave memory: [mem 0x107f380000-0x107fffffff] [ 0.000000] PM: Registered nosave memory: [mem 0x207ff80000-0x207fffffff] [ 0.000000] PM: Registered nosave memory: [mem 0x307ff80000-0x307fffffff] [ 0.000000] e820: [mem 0x90000000-0xfec0ffff] available for PCI devices [ 0.000000] Booting paravirtualized kernel on bare hardware [ 0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:4 [ 0.000000] PERCPU: Embedded 38 pages/cpu @ffff9a61bee00000 s118784 r8192 d28672 u262144 [ 0.000000] pcpu-alloc: s118784 r8192 d28672 u262144 alloc=1*2097152 [ 0.000000] pcpu-alloc: [0] 000 004 008 012 016 020 024 028 [ 0.000000] pcpu-alloc: [0] 032 036 040 044 048 052 056 060 [ 0.000000] pcpu-alloc: [0] 064 068 072 076 080 084 088 092 [ 0.000000] pcpu-alloc: [0] 096 100 104 108 112 116 120 124 [ 0.000000] pcpu-alloc: [1] 001 005 009 013 017 021 025 029 [ 0.000000] pcpu-alloc: [1] 033 037 041 045 049 053 057 061 [ 0.000000] pcpu-alloc: [1] 065 069 073 077 081 085 089 093 [ 0.000000] pcpu-alloc: [1] 097 101 105 109 113 117 121 125 [ 0.000000] pcpu-alloc: [2] 002 006 010 014 018 022 026 030 [ 0.000000] pcpu-alloc: [2] 034 038 042 046 050 054 058 062 [ 0.000000] pcpu-alloc: [2] 066 070 074 078 082 086 090 094 [ 0.000000] pcpu-alloc: [2] 098 102 106 110 114 118 122 126 [ 0.000000] pcpu-alloc: [3] 003 007 011 015 019 023 027 031 [ 0.000000] pcpu-alloc: [3] 035 039 043 047 051 055 059 063 [ 0.000000] pcpu-alloc: [3] 067 071 075 079 083 087 091 095 [ 0.000000] pcpu-alloc: [3] 099 103 107 111 115 119 123 127 [ 0.000000] Built 4 zonelists in Zone order, mobility grouping on. Total pages: 65945355 [ 0.000000] Policy zone: Normal [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.27.2.el7_lustre.pl1.x86_64 root=UUID=c52126d9-5973-45b9-8806-e607203eeb5b ro crashkernel=auto nomodeset console=ttyS0,115200 LANG=en_US.UTF-8 [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 [ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340 using standard form [ 0.000000] Memory: 9570336k/270532096k available (7676k kernel code, 2559084k absent, 4697628k reserved, 6045k data, 1876k init) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=128, Nodes=4 [ 0.000000] Hierarchical RCU implementation. [ 0.000000] RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=128. [ 0.000000] NR_IRQS:327936 nr_irqs:3624 0 [ 0.000000] Console: colour dummy device 80x25 [ 0.000000] console [ttyS0] enabled [ 0.000000] allocated 1072693248 bytes of page_cgroup [ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups [ 0.000000] Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl [ 0.000000] hpet clockevent registered [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 1996.199 MHz processor [ 0.000057] Calibrating delay loop (skipped), value calculated using timer frequency.. 3992.39 BogoMIPS (lpj=1996199) [ 0.010704] pid_max: default: 131072 minimum: 1024 [ 0.016295] Security Framework initialized [ 0.020418] SELinux: Initializing. [ 0.023976] SELinux: Starting in permissive mode [ 0.023977] Yama: becoming mindful. [ 0.044413] Dentry cache hash table entries: 33554432 (order: 16, 268435456 bytes) [ 0.100325] Inode-cache hash table entries: 16777216 (order: 15, 134217728 bytes) [ 0.128024] Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes) [ 0.135437] Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 bytes) [ 0.144527] Initializing cgroup subsys memory [ 0.148927] Initializing cgroup subsys devices [ 0.153389] Initializing cgroup subsys freezer [ 0.157850] Initializing cgroup subsys net_cls [ 0.162306] Initializing cgroup subsys blkio [ 0.166587] Initializing cgroup subsys perf_event [ 0.171310] Initializing cgroup subsys hugetlb [ 0.175765] Initializing cgroup subsys pids [ 0.179959] Initializing cgroup subsys net_prio [ 0.184573] tseg: 0070000000 [ 0.190185] LVT offset 2 assigned for vector 0xf4 [ 0.194914] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 512 [ 0.200933] Last level dTLB entries: 4KB 1536, 2MB 1536, 4MB 768 [ 0.206948] tlb_flushall_shift: 6 [ 0.210297] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp [ 0.219872] FEATURE SPEC_CTRL Not Present [ 0.223892] FEATURE IBPB_SUPPORT Present [ 0.227828] Spectre V2 : Enabling Indirect Branch Prediction Barrier [ 0.234265] Spectre V2 : Mitigation: Full retpoline [ 0.240114] Freeing SMP alternatives: 28k freed [ 0.246549] ACPI: Core revision 20130517 [ 0.255268] ACPI: All ACPI Tables successfully acquired [ 0.266901] ftrace: allocating 29215 entries in 115 pages [ 0.607122] Switched APIC routing to physical flat. [ 0.614051] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.630057] smpboot: CPU0: AMD EPYC 7401P 24-Core Processor (fam: 17, model: 01, stepping: 02) [ 0.712409] random: fast init done [ 0.742409] APIC calibration not consistent with PM-Timer: 101ms instead of 100ms [ 0.749884] APIC delta adjusted to PM-Timer: 623825 (636297) [ 0.755576] Performance Events: Fam17h core perfctr, AMD PMU driver. [ 0.762014] ... version: 0 [ 0.766025] ... bit width: 48 [ 0.770124] ... generic registers: 6 [ 0.774137] ... value mask: 0000ffffffffffff [ 0.779448] ... max period: 00007fffffffffff [ 0.784761] ... fixed-purpose events: 0 [ 0.788774] ... event mask: 000000000000003f [ 0.797117] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. [ 0.805203] smpboot: Booting Node 1, Processors #1 OK [ 0.818413] smpboot: Booting Node 2, Processors #2 OK [ 0.831618] smpboot: Booting Node 3, Processors #3 OK [ 0.844809] smpboot: Booting Node 0, Processors #4 OK [ 0.857991] smpboot: Booting Node 1, Processors #5 OK [ 0.871179] smpboot: Booting Node 2, Processors #6 OK [ 0.884353] smpboot: Booting Node 3, Processors #7 OK [ 0.897535] smpboot: Booting Node 0, Processors #8 OK [ 0.910928] smpboot: Booting Node 1, Processors #9 OK [ 0.924125] smpboot: Booting Node 2, Processors #10 OK [ 0.937402] smpboot: Booting Node 3, Processors #11 OK [ 0.950675] smpboot: Booting Node 0, Processors #12 OK [ 0.963944] smpboot: Booting Node 1, Processors #13 OK [ 0.977217] smpboot: Booting Node 2, Processors #14 OK [ 0.990486] smpboot: Booting Node 3, Processors #15 OK [ 1.003758] smpboot: Booting Node 0, Processors #16 OK [ 1.017139] smpboot: Booting Node 1, Processors #17 OK [ 1.030424] smpboot: Booting Node 2, Processors #18 OK [ 1.043715] smpboot: Booting Node 3, Processors #19 OK [ 1.056982] smpboot: Booting Node 0, Processors #20 OK [ 1.070249] smpboot: Booting Node 1, Processors #21 OK [ 1.083525] smpboot: Booting Node 2, Processors #22 OK [ 1.096807] smpboot: Booting Node 3, Processors #23 OK [ 1.110074] smpboot: Booting Node 0, Processors #24 OK [ 1.123813] smpboot: Booting Node 1, Processors #25 OK [ 1.137054] smpboot: Booting Node 2, Processors #26 OK [ 1.150289] smpboot: Booting Node 3, Processors #27 OK [ 1.163523] smpboot: Booting Node 0, Processors #28 OK [ 1.176760] smpboot: Booting Node 1, Processors #29 OK [ 1.190001] smpboot: Booting Node 2, Processors #30 OK [ 1.203227] smpboot: Booting Node 3, Processors #31 OK [ 1.216459] smpboot: Booting Node 0, Processors #32 OK [ 1.229793] smpboot: Booting Node 1, Processors #33 OK [ 1.243035] smpboot: Booting Node 2, Processors #34 OK [ 1.256285] smpboot: Booting Node 3, Processors #35 OK [ 1.269519] smpboot: Booting Node 0, Processors #36 OK [ 1.282747] smpboot: Booting Node 1, Processors #37 OK [ 1.295990] smpboot: Booting Node 2, Processors #38 OK [ 1.309240] smpboot: Booting Node 3, Processors #39 OK [ 1.322473] smpboot: Booting Node 0, Processors #40 OK [ 1.335806] smpboot: Booting Node 1, Processors #41 OK [ 1.349150] smpboot: Booting Node 2, Processors #42 OK [ 1.375158] do_IRQ: 43.55 No irq handler for vector (irq -1) [ 1.362393] smpboot: Booting Node 3, Processors #43 OK [ 1.375264] smpboot: Booting Node 0, Processors [ 1.375265] #44 [ 1.388458] OK [ 1.389373] smpboot: Booting Node 1, Processors #45 OK [ 1.402618] smpboot: Booting Node 2, Processors #46 OK [ 1.415957] smpboot: Booting Node 3, Processors #47 [ 1.428665] Brought up 48 CPUs [ 1.431922] smpboot: Max logical packages: 3 [ 1.436199] smpboot: Total of 48 processors activated (191635.10 BogoMIPS) [ 1.724574] node 0 initialised, 15462980 pages in 274ms [ 1.733462] node 1 initialised, 15989367 pages in 278ms [ 1.733476] node 2 initialised, 15989367 pages in 278ms [ 1.735885] node 3 initialised, 15984544 pages in 281ms [ 1.749711] devtmpfs: initialized [ 1.775110] EVM: security.selinux [ 1.778434] EVM: security.ima [ 1.781406] EVM: security.capability [ 1.785084] PM: Registering ACPI NVS region [mem 0x0008f000-0x0008ffff] (4096 bytes) [ 1.792826] PM: Registering ACPI NVS region [mem 0x6efcf000-0x6fdfefff] (14876672 bytes) [ 1.802488] atomic64 test passed for x86-64 platform with CX8 and with SSE [ 1.809367] pinctrl core: initialized pinctrl subsystem [ 1.814698] RTC time: 1:52:56, date: 11/04/19 [ 1.819301] NET: Registered protocol family 16 [ 1.824106] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it [ 1.831676] ACPI: bus type PCI registered [ 1.835691] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [ 1.842274] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) [ 1.851577] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820 [ 1.858368] PCI: Using configuration type 1 for base access [ 1.863953] PCI: Dell System detected, enabling pci=bfsort. [ 1.879027] ACPI: Added _OSI(Module Device) [ 1.883222] ACPI: Added _OSI(Processor Device) [ 1.887673] ACPI: Added _OSI(3.0 _SCP Extensions) [ 1.892378] ACPI: Added _OSI(Processor Aggregator Device) [ 1.897779] ACPI: Added _OSI(Linux-Dell-Video) [ 1.903049] ACPI: EC: Look up EC in DSDT [ 1.904022] ACPI: Executed 2 blocks of module-level executable AML code [ 1.916063] ACPI: Interpreter enabled [ 1.919738] ACPI: (supports S0 S5) [ 1.923147] ACPI: Using IOAPIC for interrupt routing [ 1.928327] HEST: Table parsing has been initialized. [ 1.933385] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 1.942531] ACPI: Enabled 1 GPEs in block 00 to 1F [ 1.954140] ACPI: PCI Interrupt Link [LNKA] (IRQs 4 5 7 10 11 14 15) *0 [ 1.961047] ACPI: PCI Interrupt Link [LNKB] (IRQs 4 5 7 10 11 14 15) *0 [ 1.967953] ACPI: PCI Interrupt Link [LNKC] (IRQs 4 5 7 10 11 14 15) *0 [ 1.974859] ACPI: PCI Interrupt Link [LNKD] (IRQs 4 5 7 10 11 14 15) *0 [ 1.981767] ACPI: PCI Interrupt Link [LNKE] (IRQs 4 5 7 10 11 14 15) *0 [ 1.988676] ACPI: PCI Interrupt Link [LNKF] (IRQs 4 5 7 10 11 14 15) *0 [ 1.995581] ACPI: PCI Interrupt Link [LNKG] (IRQs 4 5 7 10 11 14 15) *0 [ 2.002490] ACPI: PCI Interrupt Link [LNKH] (IRQs 4 5 7 10 11 14 15) *0 [ 2.009540] ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-3f]) [ 2.015724] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.023939] acpi PNP0A08:00: PCIe AER handled by firmware [ 2.029383] acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug] [ 2.036332] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] [ 2.043989] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration [ 2.052445] PCI host bridge to bus 0000:00 [ 2.056550] pci_bus 0000:00: root bus resource [io 0x0000-0x03af window] [ 2.063334] pci_bus 0000:00: root bus resource [io 0x03e0-0x0cf7 window] [ 2.070121] pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000c3fff window] [ 2.077599] pci_bus 0000:00: root bus resource [mem 0x000c4000-0x000c7fff window] [ 2.085078] pci_bus 0000:00: root bus resource [mem 0x000c8000-0x000cbfff window] [ 2.092560] pci_bus 0000:00: root bus resource [mem 0x000cc000-0x000cffff window] [ 2.100038] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff window] [ 2.107518] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff window] [ 2.114998] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff window] [ 2.122476] pci_bus 0000:00: root bus resource [mem 0x000dc000-0x000dffff window] [ 2.129958] pci_bus 0000:00: root bus resource [mem 0x000e0000-0x000e3fff window] [ 2.137436] pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff window] [ 2.144916] pci_bus 0000:00: root bus resource [mem 0x000e8000-0x000ebfff window] [ 2.152395] pci_bus 0000:00: root bus resource [mem 0x000ec000-0x000effff window] [ 2.159874] pci_bus 0000:00: root bus resource [mem 0x000f0000-0x000fffff window] [ 2.167354] pci_bus 0000:00: root bus resource [io 0x0d00-0x3fff window] [ 2.174141] pci_bus 0000:00: root bus resource [mem 0xe1000000-0xfebfffff window] [ 2.181619] pci_bus 0000:00: root bus resource [mem 0x10000000000-0x2bf3fffffff window] [ 2.189619] pci_bus 0000:00: root bus resource [bus 00-3f] [ 2.195113] pci 0000:00:00.0: [1022:1450] type 00 class 0x060000 [ 2.195198] pci 0000:00:00.2: [1022:1451] type 00 class 0x080600 [ 2.195285] pci 0000:00:01.0: [1022:1452] type 00 class 0x060000 [ 2.195363] pci 0000:00:02.0: [1022:1452] type 00 class 0x060000 [ 2.195437] pci 0000:00:03.0: [1022:1452] type 00 class 0x060000 [ 2.195499] pci 0000:00:03.1: [1022:1453] type 01 class 0x060400 [ 2.195802] pci 0000:00:03.1: PME# supported from D0 D3hot D3cold [ 2.195902] pci 0000:00:04.0: [1022:1452] type 00 class 0x060000 [ 2.195983] pci 0000:00:07.0: [1022:1452] type 00 class 0x060000 [ 2.196043] pci 0000:00:07.1: [1022:1454] type 01 class 0x060400 [ 2.196810] pci 0000:00:07.1: PME# supported from D0 D3hot D3cold [ 2.196890] pci 0000:00:08.0: [1022:1452] type 00 class 0x060000 [ 2.196951] pci 0000:00:08.1: [1022:1454] type 01 class 0x060400 [ 2.197797] pci 0000:00:08.1: PME# supported from D0 D3hot D3cold [ 2.197911] pci 0000:00:14.0: [1022:790b] type 00 class 0x0c0500 [ 2.198112] pci 0000:00:14.3: [1022:790e] type 00 class 0x060100 [ 2.198316] pci 0000:00:18.0: [1022:1460] type 00 class 0x060000 [ 2.198368] pci 0000:00:18.1: [1022:1461] type 00 class 0x060000 [ 2.198419] pci 0000:00:18.2: [1022:1462] type 00 class 0x060000 [ 2.198470] pci 0000:00:18.3: [1022:1463] type 00 class 0x060000 [ 2.198519] pci 0000:00:18.4: [1022:1464] type 00 class 0x060000 [ 2.198570] pci 0000:00:18.5: [1022:1465] type 00 class 0x060000 [ 2.198621] pci 0000:00:18.6: [1022:1466] type 00 class 0x060000 [ 2.198674] pci 0000:00:18.7: [1022:1467] type 00 class 0x060000 [ 2.198725] pci 0000:00:19.0: [1022:1460] type 00 class 0x060000 [ 2.198779] pci 0000:00:19.1: [1022:1461] type 00 class 0x060000 [ 2.198834] pci 0000:00:19.2: [1022:1462] type 00 class 0x060000 [ 2.198889] pci 0000:00:19.3: [1022:1463] type 00 class 0x060000 [ 2.198941] pci 0000:00:19.4: [1022:1464] type 00 class 0x060000 [ 2.198995] pci 0000:00:19.5: [1022:1465] type 00 class 0x060000 [ 2.199049] pci 0000:00:19.6: [1022:1466] type 00 class 0x060000 [ 2.199104] pci 0000:00:19.7: [1022:1467] type 00 class 0x060000 [ 2.199157] pci 0000:00:1a.0: [1022:1460] type 00 class 0x060000 [ 2.199212] pci 0000:00:1a.1: [1022:1461] type 00 class 0x060000 [ 2.199264] pci 0000:00:1a.2: [1022:1462] type 00 class 0x060000 [ 2.199320] pci 0000:00:1a.3: [1022:1463] type 00 class 0x060000 [ 2.199373] pci 0000:00:1a.4: [1022:1464] type 00 class 0x060000 [ 2.199428] pci 0000:00:1a.5: [1022:1465] type 00 class 0x060000 [ 2.199482] pci 0000:00:1a.6: [1022:1466] type 00 class 0x060000 [ 2.199537] pci 0000:00:1a.7: [1022:1467] type 00 class 0x060000 [ 2.199590] pci 0000:00:1b.0: [1022:1460] type 00 class 0x060000 [ 2.199647] pci 0000:00:1b.1: [1022:1461] type 00 class 0x060000 [ 2.199700] pci 0000:00:1b.2: [1022:1462] type 00 class 0x060000 [ 2.199754] pci 0000:00:1b.3: [1022:1463] type 00 class 0x060000 [ 2.199807] pci 0000:00:1b.4: [1022:1464] type 00 class 0x060000 [ 2.199862] pci 0000:00:1b.5: [1022:1465] type 00 class 0x060000 [ 2.199915] pci 0000:00:1b.6: [1022:1466] type 00 class 0x060000 [ 2.199969] pci 0000:00:1b.7: [1022:1467] type 00 class 0x060000 [ 2.200824] pci 0000:01:00.0: [15b3:101b] type 00 class 0x020700 [ 2.200972] pci 0000:01:00.0: reg 0x10: [mem 0xe2000000-0xe3ffffff 64bit pref] [ 2.201207] pci 0000:01:00.0: reg 0x30: [mem 0xfff00000-0xffffffff pref] [ 2.201613] pci 0000:01:00.0: PME# supported from D3cold [ 2.201893] pci 0000:00:03.1: PCI bridge to [bus 01] [ 2.206863] pci 0000:00:03.1: bridge window [mem 0xe2000000-0xe3ffffff 64bit pref] [ 2.206939] pci 0000:02:00.0: [1022:145a] type 00 class 0x130000 [ 2.207036] pci 0000:02:00.2: [1022:1456] type 00 class 0x108000 [ 2.207054] pci 0000:02:00.2: reg 0x18: [mem 0xf7300000-0xf73fffff] [ 2.207066] pci 0000:02:00.2: reg 0x24: [mem 0xf7400000-0xf7401fff] [ 2.207143] pci 0000:02:00.3: [1022:145f] type 00 class 0x0c0330 [ 2.207155] pci 0000:02:00.3: reg 0x10: [mem 0xf7200000-0xf72fffff 64bit] [ 2.207203] pci 0000:02:00.3: PME# supported from D0 D3hot D3cold [ 2.207262] pci 0000:00:07.1: PCI bridge to [bus 02] [ 2.212236] pci 0000:00:07.1: bridge window [mem 0xf7200000-0xf74fffff] [ 2.212819] pci 0000:03:00.0: [1022:1455] type 00 class 0x130000 [ 2.212927] pci 0000:03:00.1: [1022:1468] type 00 class 0x108000 [ 2.212945] pci 0000:03:00.1: reg 0x18: [mem 0xf7000000-0xf70fffff] [ 2.212958] pci 0000:03:00.1: reg 0x24: [mem 0xf7100000-0xf7101fff] [ 2.213050] pci 0000:00:08.1: PCI bridge to [bus 03] [ 2.218015] pci 0000:00:08.1: bridge window [mem 0xf7000000-0xf71fffff] [ 2.218031] pci_bus 0000:00: on NUMA node 0 [ 2.218405] ACPI: PCI Root Bridge [PC01] (domain 0000 [bus 40-7f]) [ 2.224586] acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.232804] acpi PNP0A08:01: PCIe AER handled by firmware [ 2.238246] acpi PNP0A08:01: _OSC: platform does not support [SHPCHotplug] [ 2.245193] acpi PNP0A08:01: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] [ 2.252844] acpi PNP0A08:01: FADT indicates ASPM is unsupported, using BIOS configuration [ 2.261255] PCI host bridge to bus 0000:40 [ 2.265360] pci_bus 0000:40: root bus resource [io 0x4000-0x7fff window] [ 2.272145] pci_bus 0000:40: root bus resource [mem 0xc6000000-0xe0ffffff window] [ 2.279626] pci_bus 0000:40: root bus resource [mem 0x2bf40000000-0x47e7fffffff window] [ 2.287627] pci_bus 0000:40: root bus resource [bus 40-7f] [ 2.293114] pci 0000:40:00.0: [1022:1450] type 00 class 0x060000 [ 2.293185] pci 0000:40:00.2: [1022:1451] type 00 class 0x080600 [ 2.293276] pci 0000:40:01.0: [1022:1452] type 00 class 0x060000 [ 2.293352] pci 0000:40:02.0: [1022:1452] type 00 class 0x060000 [ 2.293428] pci 0000:40:03.0: [1022:1452] type 00 class 0x060000 [ 2.293502] pci 0000:40:04.0: [1022:1452] type 00 class 0x060000 [ 2.293581] pci 0000:40:07.0: [1022:1452] type 00 class 0x060000 [ 2.293643] pci 0000:40:07.1: [1022:1454] type 01 class 0x060400 [ 2.294078] pci 0000:40:07.1: PME# supported from D0 D3hot D3cold [ 2.294158] pci 0000:40:08.0: [1022:1452] type 00 class 0x060000 [ 2.294222] pci 0000:40:08.1: [1022:1454] type 01 class 0x060400 [ 2.294335] pci 0000:40:08.1: PME# supported from D0 D3hot D3cold [ 2.295017] pci 0000:41:00.0: [1022:145a] type 00 class 0x130000 [ 2.295123] pci 0000:41:00.2: [1022:1456] type 00 class 0x108000 [ 2.295141] pci 0000:41:00.2: reg 0x18: [mem 0xdb300000-0xdb3fffff] [ 2.295155] pci 0000:41:00.2: reg 0x24: [mem 0xdb400000-0xdb401fff] [ 2.295239] pci 0000:41:00.3: [1022:145f] type 00 class 0x0c0330 [ 2.295252] pci 0000:41:00.3: reg 0x10: [mem 0xdb200000-0xdb2fffff 64bit] [ 2.295306] pci 0000:41:00.3: PME# supported from D0 D3hot D3cold [ 2.295368] pci 0000:40:07.1: PCI bridge to [bus 41] [ 2.300333] pci 0000:40:07.1: bridge window [mem 0xdb200000-0xdb4fffff] [ 2.300429] pci 0000:42:00.0: [1022:1455] type 00 class 0x130000 [ 2.300547] pci 0000:42:00.1: [1022:1468] type 00 class 0x108000 [ 2.300567] pci 0000:42:00.1: reg 0x18: [mem 0xdb000000-0xdb0fffff] [ 2.300581] pci 0000:42:00.1: reg 0x24: [mem 0xdb100000-0xdb101fff] [ 2.300682] pci 0000:40:08.1: PCI bridge to [bus 42] [ 2.305655] pci 0000:40:08.1: bridge window [mem 0xdb000000-0xdb1fffff] [ 2.305668] pci_bus 0000:40: on NUMA node 1 [ 2.305846] ACPI: PCI Root Bridge [PC02] (domain 0000 [bus 80-bf]) [ 2.312024] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.320235] acpi PNP0A08:02: PCIe AER handled by firmware [ 2.325679] acpi PNP0A08:02: _OSC: platform does not support [SHPCHotplug] [ 2.332624] acpi PNP0A08:02: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] [ 2.340275] acpi PNP0A08:02: FADT indicates ASPM is unsupported, using BIOS configuration [ 2.348714] PCI host bridge to bus 0000:80 [ 2.352820] pci_bus 0000:80: root bus resource [io 0x03b0-0x03df window] [ 2.359603] pci_bus 0000:80: root bus resource [mem 0x000a0000-0x000bffff window] [ 2.367084] pci_bus 0000:80: root bus resource [io 0x8000-0xbfff window] [ 2.373870] pci_bus 0000:80: root bus resource [mem 0xab000000-0xc5ffffff window] [ 2.381350] pci_bus 0000:80: root bus resource [mem 0x47e80000000-0x63dbfffffff window] [ 2.389348] pci_bus 0000:80: root bus resource [bus 80-bf] [ 2.394842] pci 0000:80:00.0: [1022:1450] type 00 class 0x060000 [ 2.394914] pci 0000:80:00.2: [1022:1451] type 00 class 0x080600 [ 2.395002] pci 0000:80:01.0: [1022:1452] type 00 class 0x060000 [ 2.395066] pci 0000:80:01.1: [1022:1453] type 01 class 0x060400 [ 2.395192] pci 0000:80:01.1: PME# supported from D0 D3hot D3cold [ 2.395263] pci 0000:80:01.2: [1022:1453] type 01 class 0x060400 [ 2.395855] pci 0000:80:01.2: PME# supported from D0 D3hot D3cold [ 2.395935] pci 0000:80:02.0: [1022:1452] type 00 class 0x060000 [ 2.396011] pci 0000:80:03.0: [1022:1452] type 00 class 0x060000 [ 2.396071] pci 0000:80:03.1: [1022:1453] type 01 class 0x060400 [ 2.396184] pci 0000:80:03.1: PME# supported from D0 D3hot D3cold [ 2.396281] pci 0000:80:04.0: [1022:1452] type 00 class 0x060000 [ 2.396365] pci 0000:80:07.0: [1022:1452] type 00 class 0x060000 [ 2.396428] pci 0000:80:07.1: [1022:1454] type 01 class 0x060400 [ 2.396834] pci 0000:80:07.1: PME# supported from D0 D3hot D3cold [ 2.396912] pci 0000:80:08.0: [1022:1452] type 00 class 0x060000 [ 2.396974] pci 0000:80:08.1: [1022:1454] type 01 class 0x060400 [ 2.397086] pci 0000:80:08.1: PME# supported from D0 D3hot D3cold [ 2.397299] pci 0000:81:00.0: [14e4:165f] type 00 class 0x020000 [ 2.397324] pci 0000:81:00.0: reg 0x10: [mem 0xac230000-0xac23ffff 64bit pref] [ 2.397340] pci 0000:81:00.0: reg 0x18: [mem 0xac240000-0xac24ffff 64bit pref] [ 2.397355] pci 0000:81:00.0: reg 0x20: [mem 0xac250000-0xac25ffff 64bit pref] [ 2.397365] pci 0000:81:00.0: reg 0x30: [mem 0xfffc0000-0xffffffff pref] [ 2.397441] pci 0000:81:00.0: PME# supported from D0 D3hot D3cold [ 2.397537] pci 0000:81:00.1: [14e4:165f] type 00 class 0x020000 [ 2.397562] pci 0000:81:00.1: reg 0x10: [mem 0xac200000-0xac20ffff 64bit pref] [ 2.397577] pci 0000:81:00.1: reg 0x18: [mem 0xac210000-0xac21ffff 64bit pref] [ 2.397592] pci 0000:81:00.1: reg 0x20: [mem 0xac220000-0xac22ffff 64bit pref] [ 2.397602] pci 0000:81:00.1: reg 0x30: [mem 0xfffc0000-0xffffffff pref] [ 2.397679] pci 0000:81:00.1: PME# supported from D0 D3hot D3cold [ 2.397766] pci 0000:80:01.1: PCI bridge to [bus 81] [ 2.402737] pci 0000:80:01.1: bridge window [mem 0xac200000-0xac2fffff 64bit pref] [ 2.402820] pci 0000:82:00.0: [1556:be00] type 01 class 0x060400 [ 2.405654] pci 0000:80:01.2: PCI bridge to [bus 82-83] [ 2.410881] pci 0000:80:01.2: bridge window [mem 0xc0000000-0xc08fffff] [ 2.410885] pci 0000:80:01.2: bridge window [mem 0xab000000-0xabffffff 64bit pref] [ 2.410932] pci 0000:83:00.0: [102b:0536] type 00 class 0x030000 [ 2.410950] pci 0000:83:00.0: reg 0x10: [mem 0xab000000-0xabffffff pref] [ 2.410962] pci 0000:83:00.0: reg 0x14: [mem 0xc0808000-0xc080bfff] [ 2.410973] pci 0000:83:00.0: reg 0x18: [mem 0xc0000000-0xc07fffff] [ 2.411114] pci 0000:82:00.0: PCI bridge to [bus 83] [ 2.416086] pci 0000:82:00.0: bridge window [mem 0xc0000000-0xc08fffff] [ 2.416092] pci 0000:82:00.0: bridge window [mem 0xab000000-0xabffffff 64bit pref] [ 2.416175] pci 0000:84:00.0: [1000:00d1] type 00 class 0x010700 [ 2.416197] pci 0000:84:00.0: reg 0x10: [mem 0xac000000-0xac0fffff 64bit pref] [ 2.416208] pci 0000:84:00.0: reg 0x18: [mem 0xac100000-0xac1fffff 64bit pref] [ 2.416215] pci 0000:84:00.0: reg 0x20: [mem 0xc0d00000-0xc0dfffff] [ 2.416222] pci 0000:84:00.0: reg 0x24: [io 0x8000-0x80ff] [ 2.416231] pci 0000:84:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref] [ 2.416282] pci 0000:84:00.0: supports D1 D2 [ 2.418653] pci 0000:80:03.1: PCI bridge to [bus 84] [ 2.423621] pci 0000:80:03.1: bridge window [io 0x8000-0x8fff] [ 2.423624] pci 0000:80:03.1: bridge window [mem 0xc0d00000-0xc0dfffff] [ 2.423628] pci 0000:80:03.1: bridge window [mem 0xac000000-0xac1fffff 64bit pref] [ 2.423869] pci 0000:85:00.0: [1022:145a] type 00 class 0x130000 [ 2.423973] pci 0000:85:00.2: [1022:1456] type 00 class 0x108000 [ 2.423992] pci 0000:85:00.2: reg 0x18: [mem 0xc0b00000-0xc0bfffff] [ 2.424005] pci 0000:85:00.2: reg 0x24: [mem 0xc0c00000-0xc0c01fff] [ 2.424097] pci 0000:80:07.1: PCI bridge to [bus 85] [ 2.429064] pci 0000:80:07.1: bridge window [mem 0xc0b00000-0xc0cfffff] [ 2.429158] pci 0000:86:00.0: [1022:1455] type 00 class 0x130000 [ 2.429275] pci 0000:86:00.1: [1022:1468] type 00 class 0x108000 [ 2.429295] pci 0000:86:00.1: reg 0x18: [mem 0xc0900000-0xc09fffff] [ 2.429309] pci 0000:86:00.1: reg 0x24: [mem 0xc0a00000-0xc0a01fff] [ 2.429398] pci 0000:86:00.2: [1022:7901] type 00 class 0x010601 [ 2.429430] pci 0000:86:00.2: reg 0x24: [mem 0xc0a02000-0xc0a02fff] [ 2.429468] pci 0000:86:00.2: PME# supported from D3hot D3cold [ 2.429535] pci 0000:80:08.1: PCI bridge to [bus 86] [ 2.434508] pci 0000:80:08.1: bridge window [mem 0xc0900000-0xc0afffff] [ 2.434533] pci_bus 0000:80: on NUMA node 2 [ 2.434704] ACPI: PCI Root Bridge [PC03] (domain 0000 [bus c0-ff]) [ 2.440884] acpi PNP0A08:03: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.449093] acpi PNP0A08:03: PCIe AER handled by firmware [ 2.454528] acpi PNP0A08:03: _OSC: platform does not support [SHPCHotplug] [ 2.461475] acpi PNP0A08:03: _OSC: OS now controls [PCIeHotplug PME PCIeCapability] [ 2.469128] acpi PNP0A08:03: FADT indicates ASPM is unsupported, using BIOS configuration [ 2.477454] acpi PNP0A08:03: host bridge window [mem 0x63dc0000000-0xffffffffffff window] ([0x80000000000-0xffffffffffff] ignored, not CPU addressable) [ 2.491091] PCI host bridge to bus 0000:c0 [ 2.495188] pci_bus 0000:c0: root bus resource [io 0xc000-0xffff window] [ 2.501975] pci_bus 0000:c0: root bus resource [mem 0x90000000-0xaaffffff window] [ 2.509454] pci_bus 0000:c0: root bus resource [mem 0x63dc0000000-0x7ffffffffff window] [ 2.517455] pci_bus 0000:c0: root bus resource [bus c0-ff] [ 2.522945] pci 0000:c0:00.0: [1022:1450] type 00 class 0x060000 [ 2.523015] pci 0000:c0:00.2: [1022:1451] type 00 class 0x080600 [ 2.523104] pci 0000:c0:01.0: [1022:1452] type 00 class 0x060000 [ 2.523165] pci 0000:c0:01.1: [1022:1453] type 01 class 0x060400 [ 2.523294] pci 0000:c0:01.1: PME# supported from D0 D3hot D3cold [ 2.523392] pci 0000:c0:02.0: [1022:1452] type 00 class 0x060000 [ 2.523466] pci 0000:c0:03.0: [1022:1452] type 00 class 0x060000 [ 2.523541] pci 0000:c0:04.0: [1022:1452] type 00 class 0x060000 [ 2.523619] pci 0000:c0:07.0: [1022:1452] type 00 class 0x060000 [ 2.523683] pci 0000:c0:07.1: [1022:1454] type 01 class 0x060400 [ 2.524121] pci 0000:c0:07.1: PME# supported from D0 D3hot D3cold [ 2.524199] pci 0000:c0:08.0: [1022:1452] type 00 class 0x060000 [ 2.524262] pci 0000:c0:08.1: [1022:1454] type 01 class 0x060400 [ 2.524374] pci 0000:c0:08.1: PME# supported from D0 D3hot D3cold [ 2.525052] pci 0000:c1:00.0: [1000:005f] type 00 class 0x010400 [ 2.525065] pci 0000:c1:00.0: reg 0x10: [io 0xc000-0xc0ff] [ 2.525075] pci 0000:c1:00.0: reg 0x14: [mem 0xa5500000-0xa550ffff 64bit] [ 2.525085] pci 0000:c1:00.0: reg 0x1c: [mem 0xa5400000-0xa54fffff 64bit] [ 2.525097] pci 0000:c1:00.0: reg 0x30: [mem 0xfff00000-0xffffffff pref] [ 2.525146] pci 0000:c1:00.0: supports D1 D2 [ 2.525196] pci 0000:c0:01.1: PCI bridge to [bus c1] [ 2.530162] pci 0000:c0:01.1: bridge window [io 0xc000-0xcfff] [ 2.530165] pci 0000:c0:01.1: bridge window [mem 0xa5400000-0xa55fffff] [ 2.530256] pci 0000:c2:00.0: [1022:145a] type 00 class 0x130000 [ 2.530362] pci 0000:c2:00.2: [1022:1456] type 00 class 0x108000 [ 2.530381] pci 0000:c2:00.2: reg 0x18: [mem 0xa5200000-0xa52fffff] [ 2.530394] pci 0000:c2:00.2: reg 0x24: [mem 0xa5300000-0xa5301fff] [ 2.530486] pci 0000:c0:07.1: PCI bridge to [bus c2] [ 2.535460] pci 0000:c0:07.1: bridge window [mem 0xa5200000-0xa53fffff] [ 2.535554] pci 0000:c3:00.0: [1022:1455] type 00 class 0x130000 [ 2.535672] pci 0000:c3:00.1: [1022:1468] type 00 class 0x108000 [ 2.535691] pci 0000:c3:00.1: reg 0x18: [mem 0xa5000000-0xa50fffff] [ 2.535705] pci 0000:c3:00.1: reg 0x24: [mem 0xa5100000-0xa5101fff] [ 2.535806] pci 0000:c0:08.1: PCI bridge to [bus c3] [ 2.540773] pci 0000:c0:08.1: bridge window [mem 0xa5000000-0xa51fffff] [ 2.540790] pci_bus 0000:c0: on NUMA node 3 [ 2.542922] vgaarb: device added: PCI:0000:83:00.0,decodes=io+mem,owns=io+mem,locks=none [ 2.551017] vgaarb: loaded [ 2.553733] vgaarb: bridge control possible 0000:83:00.0 [ 2.559161] SCSI subsystem initialized [ 2.562938] ACPI: bus type USB registered [ 2.566967] usbcore: registered new interface driver usbfs [ 2.572461] usbcore: registered new interface driver hub [ 2.577986] usbcore: registered new device driver usb [ 2.583362] EDAC MC: Ver: 3.0.0 [ 2.586766] PCI: Using ACPI for IRQ routing [ 2.609922] PCI: pci_cache_line_size set to 64 bytes [ 2.610074] e820: reserve RAM buffer [mem 0x0008f000-0x0008ffff] [ 2.610076] e820: reserve RAM buffer [mem 0x3788e020-0x37ffffff] [ 2.610078] e820: reserve RAM buffer [mem 0x378a7020-0x37ffffff] [ 2.610080] e820: reserve RAM buffer [mem 0x378cd020-0x37ffffff] [ 2.610081] e820: reserve RAM buffer [mem 0x378d6020-0x37ffffff] [ 2.610083] e820: reserve RAM buffer [mem 0x37908020-0x37ffffff] [ 2.610084] e820: reserve RAM buffer [mem 0x3793a020-0x37ffffff] [ 2.610085] e820: reserve RAM buffer [mem 0x4f781000-0x4fffffff] [ 2.610087] e820: reserve RAM buffer [mem 0x6cacf000-0x6fffffff] [ 2.610088] e820: reserve RAM buffer [mem 0x107f380000-0x107fffffff] [ 2.610089] e820: reserve RAM buffer [mem 0x207ff80000-0x207fffffff] [ 2.610090] e820: reserve RAM buffer [mem 0x307ff80000-0x307fffffff] [ 2.610092] e820: reserve RAM buffer [mem 0x407ff80000-0x407fffffff] [ 2.610354] NetLabel: Initializing [ 2.613762] NetLabel: domain hash size = 128 [ 2.618121] NetLabel: protocols = UNLABELED CIPSOv4 [ 2.623102] NetLabel: unlabeled traffic allowed by default [ 2.628872] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 [ 2.633851] hpet0: 3 comparators, 32-bit 14.318180 MHz counter [ 2.641859] Switched to clocksource hpet [ 2.650408] pnp: PnP ACPI init [ 2.653481] ACPI: bus type PNP registered [ 2.657681] system 00:00: [mem 0x80000000-0x8fffffff] has been reserved [ 2.664309] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active) [ 2.664371] pnp 00:01: Plug and Play ACPI device, IDs PNP0b00 (active) [ 2.664565] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active) [ 2.664740] pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active) [ 2.664916] pnp: PnP ACPI: found 4 devices [ 2.669021] ACPI: bus type PNP unregistered [ 2.680490] pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: no compatible bridge window [ 2.690404] pci 0000:81:00.0: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window [ 2.700318] pci 0000:81:00.1: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window [ 2.710234] pci 0000:c1:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]: no compatible bridge window [ 2.720171] pci 0000:00:03.1: BAR 14: assigned [mem 0xe1000000-0xe10fffff] [ 2.727056] pci 0000:01:00.0: BAR 6: assigned [mem 0xe1000000-0xe10fffff pref] [ 2.734284] pci 0000:00:03.1: PCI bridge to [bus 01] [ 2.739259] pci 0000:00:03.1: bridge window [mem 0xe1000000-0xe10fffff] [ 2.746055] pci 0000:00:03.1: bridge window [mem 0xe2000000-0xe3ffffff 64bit pref] [ 2.753804] pci 0000:00:07.1: PCI bridge to [bus 02] [ 2.758779] pci 0000:00:07.1: bridge window [mem 0xf7200000-0xf74fffff] [ 2.765575] pci 0000:00:08.1: PCI bridge to [bus 03] [ 2.770546] pci 0000:00:08.1: bridge window [mem 0xf7000000-0xf71fffff] [ 2.777347] pci_bus 0000:00: resource 4 [io 0x0000-0x03af window] [ 2.777349] pci_bus 0000:00: resource 5 [io 0x03e0-0x0cf7 window] [ 2.777350] pci_bus 0000:00: resource 6 [mem 0x000c0000-0x000c3fff window] [ 2.777352] pci_bus 0000:00: resource 7 [mem 0x000c4000-0x000c7fff window] [ 2.777354] pci_bus 0000:00: resource 8 [mem 0x000c8000-0x000cbfff window] [ 2.777355] pci_bus 0000:00: resource 9 [mem 0x000cc000-0x000cffff window] [ 2.777357] pci_bus 0000:00: resource 10 [mem 0x000d0000-0x000d3fff window] [ 2.777359] pci_bus 0000:00: resource 11 [mem 0x000d4000-0x000d7fff window] [ 2.777360] pci_bus 0000:00: resource 12 [mem 0x000d8000-0x000dbfff window] [ 2.777362] pci_bus 0000:00: resource 13 [mem 0x000dc000-0x000dffff window] [ 2.777364] pci_bus 0000:00: resource 14 [mem 0x000e0000-0x000e3fff window] [ 2.777365] pci_bus 0000:00: resource 15 [mem 0x000e4000-0x000e7fff window] [ 2.777367] pci_bus 0000:00: resource 16 [mem 0x000e8000-0x000ebfff window] [ 2.777369] pci_bus 0000:00: resource 17 [mem 0x000ec000-0x000effff window] [ 2.777370] pci_bus 0000:00: resource 18 [mem 0x000f0000-0x000fffff window] [ 2.777372] pci_bus 0000:00: resource 19 [io 0x0d00-0x3fff window] [ 2.777374] pci_bus 0000:00: resource 20 [mem 0xe1000000-0xfebfffff window] [ 2.777375] pci_bus 0000:00: resource 21 [mem 0x10000000000-0x2bf3fffffff window] [ 2.777377] pci_bus 0000:01: resource 1 [mem 0xe1000000-0xe10fffff] [ 2.777379] pci_bus 0000:01: resource 2 [mem 0xe2000000-0xe3ffffff 64bit pref] [ 2.777381] pci_bus 0000:02: resource 1 [mem 0xf7200000-0xf74fffff] [ 2.777383] pci_bus 0000:03: resource 1 [mem 0xf7000000-0xf71fffff] [ 2.777394] pci 0000:40:07.1: PCI bridge to [bus 41] [ 2.782369] pci 0000:40:07.1: bridge window [mem 0xdb200000-0xdb4fffff] [ 2.789165] pci 0000:40:08.1: PCI bridge to [bus 42] [ 2.794140] pci 0000:40:08.1: bridge window [mem 0xdb000000-0xdb1fffff] [ 2.800937] pci_bus 0000:40: resource 4 [io 0x4000-0x7fff window] [ 2.800939] pci_bus 0000:40: resource 5 [mem 0xc6000000-0xe0ffffff window] [ 2.800941] pci_bus 0000:40: resource 6 [mem 0x2bf40000000-0x47e7fffffff window] [ 2.800943] pci_bus 0000:41: resource 1 [mem 0xdb200000-0xdb4fffff] [ 2.800944] pci_bus 0000:42: resource 1 [mem 0xdb000000-0xdb1fffff] [ 2.800976] pci 0000:80:01.1: BAR 14: assigned [mem 0xac300000-0xac3fffff] [ 2.807858] pci 0000:81:00.0: BAR 6: assigned [mem 0xac300000-0xac33ffff pref] [ 2.815085] pci 0000:81:00.1: BAR 6: assigned [mem 0xac340000-0xac37ffff pref] [ 2.822313] pci 0000:80:01.1: PCI bridge to [bus 81] [ 2.827291] pci 0000:80:01.1: bridge window [mem 0xac300000-0xac3fffff] [ 2.834084] pci 0000:80:01.1: bridge window [mem 0xac200000-0xac2fffff 64bit pref] [ 2.841833] pci 0000:82:00.0: PCI bridge to [bus 83] [ 2.846810] pci 0000:82:00.0: bridge window [mem 0xc0000000-0xc08fffff] [ 2.853604] pci 0000:82:00.0: bridge window [mem 0xab000000-0xabffffff 64bit pref] [ 2.861355] pci 0000:80:01.2: PCI bridge to [bus 82-83] [ 2.866595] pci 0000:80:01.2: bridge window [mem 0xc0000000-0xc08fffff] [ 2.873388] pci 0000:80:01.2: bridge window [mem 0xab000000-0xabffffff 64bit pref] [ 2.881140] pci 0000:84:00.0: BAR 6: no space for [mem size 0x00040000 pref] [ 2.888192] pci 0000:84:00.0: BAR 6: failed to assign [mem size 0x00040000 pref] [ 2.895594] pci 0000:80:03.1: PCI bridge to [bus 84] [ 2.900567] pci 0000:80:03.1: bridge window [io 0x8000-0x8fff] [ 2.906670] pci 0000:80:03.1: bridge window [mem 0xc0d00000-0xc0dfffff] [ 2.913463] pci 0000:80:03.1: bridge window [mem 0xac000000-0xac1fffff 64bit pref] [ 2.921214] pci 0000:80:07.1: PCI bridge to [bus 85] [ 2.926187] pci 0000:80:07.1: bridge window [mem 0xc0b00000-0xc0cfffff] [ 2.932986] pci 0000:80:08.1: PCI bridge to [bus 86] [ 2.937966] pci 0000:80:08.1: bridge window [mem 0xc0900000-0xc0afffff] [ 2.944764] pci_bus 0000:80: resource 4 [io 0x03b0-0x03df window] [ 2.944766] pci_bus 0000:80: resource 5 [mem 0x000a0000-0x000bffff window] [ 2.944767] pci_bus 0000:80: resource 6 [io 0x8000-0xbfff window] [ 2.944769] pci_bus 0000:80: resource 7 [mem 0xab000000-0xc5ffffff window] [ 2.944771] pci_bus 0000:80: resource 8 [mem 0x47e80000000-0x63dbfffffff window] [ 2.944773] pci_bus 0000:81: resource 1 [mem 0xac300000-0xac3fffff] [ 2.944774] pci_bus 0000:81: resource 2 [mem 0xac200000-0xac2fffff 64bit pref] [ 2.944776] pci_bus 0000:82: resource 1 [mem 0xc0000000-0xc08fffff] [ 2.944778] pci_bus 0000:82: resource 2 [mem 0xab000000-0xabffffff 64bit pref] [ 2.944779] pci_bus 0000:83: resource 1 [mem 0xc0000000-0xc08fffff] [ 2.944781] pci_bus 0000:83: resource 2 [mem 0xab000000-0xabffffff 64bit pref] [ 2.944783] pci_bus 0000:84: resource 0 [io 0x8000-0x8fff] [ 2.944784] pci_bus 0000:84: resource 1 [mem 0xc0d00000-0xc0dfffff] [ 2.944786] pci_bus 0000:84: resource 2 [mem 0xac000000-0xac1fffff 64bit pref] [ 2.944788] pci_bus 0000:85: resource 1 [mem 0xc0b00000-0xc0cfffff] [ 2.944789] pci_bus 0000:86: resource 1 [mem 0xc0900000-0xc0afffff] [ 2.944805] pci 0000:c1:00.0: BAR 6: no space for [mem size 0x00100000 pref] [ 2.951857] pci 0000:c1:00.0: BAR 6: failed to assign [mem size 0x00100000 pref] [ 2.959260] pci 0000:c0:01.1: PCI bridge to [bus c1] [ 2.964236] pci 0000:c0:01.1: bridge window [io 0xc000-0xcfff] [ 2.970338] pci 0000:c0:01.1: bridge window [mem 0xa5400000-0xa55fffff] [ 2.977134] pci 0000:c0:07.1: PCI bridge to [bus c2] [ 2.982106] pci 0000:c0:07.1: bridge window [mem 0xa5200000-0xa53fffff] [ 2.988903] pci 0000:c0:08.1: PCI bridge to [bus c3] [ 2.993878] pci 0000:c0:08.1: bridge window [mem 0xa5000000-0xa51fffff] [ 3.000673] pci_bus 0000:c0: resource 4 [io 0xc000-0xffff window] [ 3.000675] pci_bus 0000:c0: resource 5 [mem 0x90000000-0xaaffffff window] [ 3.000676] pci_bus 0000:c0: resource 6 [mem 0x63dc0000000-0x7ffffffffff window] [ 3.000678] pci_bus 0000:c1: resource 0 [io 0xc000-0xcfff] [ 3.000680] pci_bus 0000:c1: resource 1 [mem 0xa5400000-0xa55fffff] [ 3.000681] pci_bus 0000:c2: resource 1 [mem 0xa5200000-0xa53fffff] [ 3.000683] pci_bus 0000:c3: resource 1 [mem 0xa5000000-0xa51fffff] [ 3.000769] NET: Registered protocol family 2 [ 3.005812] TCP established hash table entries: 524288 (order: 10, 4194304 bytes) [ 3.013953] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) [ 3.020779] TCP: Hash tables configured (established 524288 bind 65536) [ 3.027417] TCP: reno registered [ 3.030757] UDP hash table entries: 65536 (order: 9, 2097152 bytes) [ 3.037358] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes) [ 3.044553] NET: Registered protocol family 1 [ 3.049366] pci 0000:83:00.0: Boot video device [ 3.049403] PCI: CLS 64 bytes, default 64 [ 3.049456] Unpacking initramfs... [ 3.320231] Freeing initrd memory: 19740k freed [ 3.326950] AMD-Vi: IOMMU performance counters supported [ 3.332338] AMD-Vi: IOMMU performance counters supported [ 3.337690] AMD-Vi: IOMMU performance counters supported [ 3.343048] AMD-Vi: IOMMU performance counters supported [ 3.349688] iommu: Adding device 0000:00:01.0 to group 0 [ 3.355688] iommu: Adding device 0000:00:02.0 to group 1 [ 3.361699] iommu: Adding device 0000:00:03.0 to group 2 [ 3.367796] iommu: Adding device 0000:00:03.1 to group 3 [ 3.373830] iommu: Adding device 0000:00:04.0 to group 4 [ 3.379843] iommu: Adding device 0000:00:07.0 to group 5 [ 3.385872] iommu: Adding device 0000:00:07.1 to group 6 [ 3.391906] iommu: Adding device 0000:00:08.0 to group 7 [ 3.397894] iommu: Adding device 0000:00:08.1 to group 8 [ 3.403917] iommu: Adding device 0000:00:14.0 to group 9 [ 3.409253] iommu: Adding device 0000:00:14.3 to group 9 [ 3.415338] iommu: Adding device 0000:00:18.0 to group 10 [ 3.420765] iommu: Adding device 0000:00:18.1 to group 10 [ 3.426189] iommu: Adding device 0000:00:18.2 to group 10 [ 3.431616] iommu: Adding device 0000:00:18.3 to group 10 [ 3.437040] iommu: Adding device 0000:00:18.4 to group 10 [ 3.442465] iommu: Adding device 0000:00:18.5 to group 10 [ 3.447892] iommu: Adding device 0000:00:18.6 to group 10 [ 3.453319] iommu: Adding device 0000:00:18.7 to group 10 [ 3.459517] iommu: Adding device 0000:00:19.0 to group 11 [ 3.464940] iommu: Adding device 0000:00:19.1 to group 11 [ 3.470367] iommu: Adding device 0000:00:19.2 to group 11 [ 3.475789] iommu: Adding device 0000:00:19.3 to group 11 [ 3.481215] iommu: Adding device 0000:00:19.4 to group 11 [ 3.486643] iommu: Adding device 0000:00:19.5 to group 11 [ 3.492070] iommu: Adding device 0000:00:19.6 to group 11 [ 3.497493] iommu: Adding device 0000:00:19.7 to group 11 [ 3.503662] iommu: Adding device 0000:00:1a.0 to group 12 [ 3.509092] iommu: Adding device 0000:00:1a.1 to group 12 [ 3.514513] iommu: Adding device 0000:00:1a.2 to group 12 [ 3.519939] iommu: Adding device 0000:00:1a.3 to group 12 [ 3.525368] iommu: Adding device 0000:00:1a.4 to group 12 [ 3.530793] iommu: Adding device 0000:00:1a.5 to group 12 [ 3.536222] iommu: Adding device 0000:00:1a.6 to group 12 [ 3.541651] iommu: Adding device 0000:00:1a.7 to group 12 [ 3.547855] iommu: Adding device 0000:00:1b.0 to group 13 [ 3.553283] iommu: Adding device 0000:00:1b.1 to group 13 [ 3.558708] iommu: Adding device 0000:00:1b.2 to group 13 [ 3.564135] iommu: Adding device 0000:00:1b.3 to group 13 [ 3.569561] iommu: Adding device 0000:00:1b.4 to group 13 [ 3.574986] iommu: Adding device 0000:00:1b.5 to group 13 [ 3.580410] iommu: Adding device 0000:00:1b.6 to group 13 [ 3.585840] iommu: Adding device 0000:00:1b.7 to group 13 [ 3.591998] iommu: Adding device 0000:01:00.0 to group 14 [ 3.598080] iommu: Adding device 0000:02:00.0 to group 15 [ 3.604168] iommu: Adding device 0000:02:00.2 to group 16 [ 3.610246] iommu: Adding device 0000:02:00.3 to group 17 [ 3.616330] iommu: Adding device 0000:03:00.0 to group 18 [ 3.622428] iommu: Adding device 0000:03:00.1 to group 19 [ 3.628541] iommu: Adding device 0000:40:01.0 to group 20 [ 3.634607] iommu: Adding device 0000:40:02.0 to group 21 [ 3.640675] iommu: Adding device 0000:40:03.0 to group 22 [ 3.646782] iommu: Adding device 0000:40:04.0 to group 23 [ 3.652862] iommu: Adding device 0000:40:07.0 to group 24 [ 3.658905] iommu: Adding device 0000:40:07.1 to group 25 [ 3.664980] iommu: Adding device 0000:40:08.0 to group 26 [ 3.671004] iommu: Adding device 0000:40:08.1 to group 27 [ 3.677065] iommu: Adding device 0000:41:00.0 to group 28 [ 3.683112] iommu: Adding device 0000:41:00.2 to group 29 [ 3.689114] iommu: Adding device 0000:41:00.3 to group 30 [ 3.695158] iommu: Adding device 0000:42:00.0 to group 31 [ 3.701190] iommu: Adding device 0000:42:00.1 to group 32 [ 3.707208] iommu: Adding device 0000:80:01.0 to group 33 [ 3.713225] iommu: Adding device 0000:80:01.1 to group 34 [ 3.719402] iommu: Adding device 0000:80:01.2 to group 35 [ 3.725465] iommu: Adding device 0000:80:02.0 to group 36 [ 3.731514] iommu: Adding device 0000:80:03.0 to group 37 [ 3.737544] iommu: Adding device 0000:80:03.1 to group 38 [ 3.743610] iommu: Adding device 0000:80:04.0 to group 39 [ 3.749630] iommu: Adding device 0000:80:07.0 to group 40 [ 3.755717] iommu: Adding device 0000:80:07.1 to group 41 [ 3.761747] iommu: Adding device 0000:80:08.0 to group 42 [ 3.767753] iommu: Adding device 0000:80:08.1 to group 43 [ 3.773789] iommu: Adding device 0000:81:00.0 to group 44 [ 3.779230] iommu: Adding device 0000:81:00.1 to group 44 [ 3.785262] iommu: Adding device 0000:82:00.0 to group 45 [ 3.790675] iommu: Adding device 0000:83:00.0 to group 45 [ 3.796722] iommu: Adding device 0000:84:00.0 to group 46 [ 3.802752] iommu: Adding device 0000:85:00.0 to group 47 [ 3.808785] iommu: Adding device 0000:85:00.2 to group 48 [ 3.814789] iommu: Adding device 0000:86:00.0 to group 49 [ 3.820829] iommu: Adding device 0000:86:00.1 to group 50 [ 3.826861] iommu: Adding device 0000:86:00.2 to group 51 [ 3.832909] iommu: Adding device 0000:c0:01.0 to group 52 [ 3.838964] iommu: Adding device 0000:c0:01.1 to group 53 [ 3.845055] iommu: Adding device 0000:c0:02.0 to group 54 [ 3.851129] iommu: Adding device 0000:c0:03.0 to group 55 [ 3.857226] iommu: Adding device 0000:c0:04.0 to group 56 [ 3.863316] iommu: Adding device 0000:c0:07.0 to group 57 [ 3.869326] iommu: Adding device 0000:c0:07.1 to group 58 [ 3.875434] iommu: Adding device 0000:c0:08.0 to group 59 [ 3.881487] iommu: Adding device 0000:c0:08.1 to group 60 [ 3.889915] iommu: Adding device 0000:c1:00.0 to group 61 [ 3.896004] iommu: Adding device 0000:c2:00.0 to group 62 [ 3.902040] iommu: Adding device 0000:c2:00.2 to group 63 [ 3.908150] iommu: Adding device 0000:c3:00.0 to group 64 [ 3.914173] iommu: Adding device 0000:c3:00.1 to group 65 [ 3.919770] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 [ 3.925095] AMD-Vi: Extended features (0xf77ef22294ada): [ 3.930413] PPR NX GT IA GA PC GA_vAPIC [ 3.934549] AMD-Vi: Found IOMMU at 0000:40:00.2 cap 0x40 [ 3.939870] AMD-Vi: Extended features (0xf77ef22294ada): [ 3.945190] PPR NX GT IA GA PC GA_vAPIC [ 3.949335] AMD-Vi: Found IOMMU at 0000:80:00.2 cap 0x40 [ 3.954656] AMD-Vi: Extended features (0xf77ef22294ada): [ 3.959976] PPR NX GT IA GA PC GA_vAPIC [ 3.964119] AMD-Vi: Found IOMMU at 0000:c0:00.2 cap 0x40 [ 3.969442] AMD-Vi: Extended features (0xf77ef22294ada): [ 3.974762] PPR NX GT IA GA PC GA_vAPIC [ 3.978906] AMD-Vi: Interrupt remapping enabled [ 3.983443] AMD-Vi: virtual APIC enabled [ 3.987441] pci 0000:00:00.2: irq 26 for MSI/MSI-X [ 3.987542] pci 0000:40:00.2: irq 27 for MSI/MSI-X [ 3.987627] pci 0000:80:00.2: irq 28 for MSI/MSI-X [ 3.987712] pci 0000:c0:00.2: irq 29 for MSI/MSI-X [ 3.987766] AMD-Vi: Lazy IO/TLB flushing enabled [ 3.994099] perf: AMD NB counters detected [ 3.998245] perf: AMD LLC counters detected [ 4.008422] sha1_ssse3: Using SHA-NI optimized SHA-1 implementation [ 4.014778] sha256_ssse3: Using SHA-256-NI optimized SHA-256 implementation [ 4.023328] futex hash table entries: 32768 (order: 9, 2097152 bytes) [ 4.029960] Initialise system trusted keyring [ 4.034367] audit: initializing netlink socket (disabled) [ 4.039786] type=2000 audit(1572832374.193:1): initialized [ 4.070581] HugeTLB registered 1 GB page size, pre-allocated 0 pages [ 4.076946] HugeTLB registered 2 MB page size, pre-allocated 0 pages [ 4.084555] zpool: loaded [ 4.087190] zbud: loaded [ 4.090099] VFS: Disk quotas dquot_6.6.0 [ 4.094133] Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 4.100951] msgmni has been set to 32768 [ 4.104982] Key type big_key registered [ 4.108831] SELinux: Registering netfilter hooks [ 4.111253] NET: Registered protocol family 38 [ 4.115711] Key type asymmetric registered [ 4.119820] Asymmetric key parser 'x509' registered [ 4.124756] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248) [ 4.132301] io scheduler noop registered [ 4.136237] io scheduler deadline registered (default) [ 4.141418] io scheduler cfq registered [ 4.145267] io scheduler mq-deadline registered [ 4.149809] io scheduler kyber registered [ 4.154551] pcieport 0000:00:03.1: irq 30 for MSI/MSI-X [ 4.154720] pcieport 0000:00:07.1: irq 31 for MSI/MSI-X [ 4.155686] pcieport 0000:00:08.1: irq 33 for MSI/MSI-X [ 4.156668] pcieport 0000:40:07.1: irq 34 for MSI/MSI-X [ 4.156995] pcieport 0000:40:08.1: irq 36 for MSI/MSI-X [ 4.158223] pcieport 0000:80:01.1: irq 37 for MSI/MSI-X [ 4.158475] pcieport 0000:80:01.2: irq 38 for MSI/MSI-X [ 4.159171] pcieport 0000:80:03.1: irq 39 for MSI/MSI-X [ 4.159467] pcieport 0000:80:07.1: irq 41 for MSI/MSI-X [ 4.160198] pcieport 0000:80:08.1: irq 43 for MSI/MSI-X [ 4.160517] pcieport 0000:c0:01.1: irq 44 for MSI/MSI-X [ 4.161262] pcieport 0000:c0:07.1: irq 46 for MSI/MSI-X [ 4.161500] pcieport 0000:c0:08.1: irq 48 for MSI/MSI-X [ 4.161611] pcieport 0000:00:03.1: Signaling PME through PCIe PME interrupt [ 4.168582] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt [ 4.175117] pcie_pme 0000:00:03.1:pcie001: service driver pcie_pme loaded [ 4.175128] pcieport 0000:00:07.1: Signaling PME through PCIe PME interrupt [ 4.182091] pci 0000:02:00.0: Signaling PME through PCIe PME interrupt [ 4.188626] pci 0000:02:00.2: Signaling PME through PCIe PME interrupt [ 4.195160] pci 0000:02:00.3: Signaling PME through PCIe PME interrupt [ 4.201698] pcie_pme 0000:00:07.1:pcie001: service driver pcie_pme loaded [ 4.201709] pcieport 0000:00:08.1: Signaling PME through PCIe PME interrupt [ 4.208672] pci 0000:03:00.0: Signaling PME through PCIe PME interrupt [ 4.215206] pci 0000:03:00.1: Signaling PME through PCIe PME interrupt [ 4.221743] pcie_pme 0000:00:08.1:pcie001: service driver pcie_pme loaded [ 4.221764] pcieport 0000:40:07.1: Signaling PME through PCIe PME interrupt [ 4.228728] pci 0000:41:00.0: Signaling PME through PCIe PME interrupt [ 4.235262] pci 0000:41:00.2: Signaling PME through PCIe PME interrupt [ 4.241798] pci 0000:41:00.3: Signaling PME through PCIe PME interrupt [ 4.248333] pcie_pme 0000:40:07.1:pcie001: service driver pcie_pme loaded [ 4.248348] pcieport 0000:40:08.1: Signaling PME through PCIe PME interrupt [ 4.255319] pci 0000:42:00.0: Signaling PME through PCIe PME interrupt [ 4.261852] pci 0000:42:00.1: Signaling PME through PCIe PME interrupt [ 4.268389] pcie_pme 0000:40:08.1:pcie001: service driver pcie_pme loaded [ 4.268405] pcieport 0000:80:01.1: Signaling PME through PCIe PME interrupt [ 4.275374] pci 0000:81:00.0: Signaling PME through PCIe PME interrupt [ 4.281909] pci 0000:81:00.1: Signaling PME through PCIe PME interrupt [ 4.288445] pcie_pme 0000:80:01.1:pcie001: service driver pcie_pme loaded [ 4.288460] pcieport 0000:80:01.2: Signaling PME through PCIe PME interrupt [ 4.295429] pci 0000:82:00.0: Signaling PME through PCIe PME interrupt [ 4.301963] pci 0000:83:00.0: Signaling PME through PCIe PME interrupt [ 4.308500] pcie_pme 0000:80:01.2:pcie001: service driver pcie_pme loaded [ 4.308514] pcieport 0000:80:03.1: Signaling PME through PCIe PME interrupt [ 4.315485] pci 0000:84:00.0: Signaling PME through PCIe PME interrupt [ 4.322021] pcie_pme 0000:80:03.1:pcie001: service driver pcie_pme loaded [ 4.322036] pcieport 0000:80:07.1: Signaling PME through PCIe PME interrupt [ 4.329005] pci 0000:85:00.0: Signaling PME through PCIe PME interrupt [ 4.335540] pci 0000:85:00.2: Signaling PME through PCIe PME interrupt [ 4.342076] pcie_pme 0000:80:07.1:pcie001: service driver pcie_pme loaded [ 4.342093] pcieport 0000:80:08.1: Signaling PME through PCIe PME interrupt [ 4.349061] pci 0000:86:00.0: Signaling PME through PCIe PME interrupt [ 4.355594] pci 0000:86:00.1: Signaling PME through PCIe PME interrupt [ 4.362130] pci 0000:86:00.2: Signaling PME through PCIe PME interrupt [ 4.368664] pcie_pme 0000:80:08.1:pcie001: service driver pcie_pme loaded [ 4.368680] pcieport 0000:c0:01.1: Signaling PME through PCIe PME interrupt [ 4.375649] pci 0000:c1:00.0: Signaling PME through PCIe PME interrupt [ 4.382185] pcie_pme 0000:c0:01.1:pcie001: service driver pcie_pme loaded [ 4.382198] pcieport 0000:c0:07.1: Signaling PME through PCIe PME interrupt [ 4.389161] pci 0000:c2:00.0: Signaling PME through PCIe PME interrupt [ 4.395696] pci 0000:c2:00.2: Signaling PME through PCIe PME interrupt [ 4.402232] pcie_pme 0000:c0:07.1:pcie001: service driver pcie_pme loaded [ 4.402245] pcieport 0000:c0:08.1: Signaling PME through PCIe PME interrupt [ 4.409208] pci 0000:c3:00.0: Signaling PME through PCIe PME interrupt [ 4.415742] pci 0000:c3:00.1: Signaling PME through PCIe PME interrupt [ 4.422279] pcie_pme 0000:c0:08.1:pcie001: service driver pcie_pme loaded [ 4.422299] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 4.427882] pciehp: PCI Express Hot Plug Controller Driver version: 0.4 [ 4.434559] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 [ 4.441367] efifb: probing for efifb [ 4.444962] efifb: framebuffer at 0xab000000, mapped to 0xffffbc2b59800000, using 3072k, total 3072k [ 4.454093] efifb: mode is 1024x768x32, linelength=4096, pages=1 [ 4.460107] efifb: scrolling: redraw [ 4.463697] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0 [ 4.485535] Console: switching to colour frame buffer device 128x48 [ 4.507692] fb0: EFI VGA frame buffer device [ 4.512069] input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input0 [ 4.520253] ACPI: Power Button [PWRB] [ 4.523974] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1 [ 4.531380] ACPI: Power Button [PWRF] [ 4.536238] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. [ 4.543724] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 4.570918] 00:02: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A [ 4.597457] 00:03: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A [ 4.603491] Non-volatile memory driver v1.3 [ 4.607719] Linux agpgart interface v0.103 [ 4.613475] crash memory driver: version 1.1 [ 4.617985] rdac: device handler registered [ 4.622231] hp_sw: device handler registered [ 4.626518] emc: device handler registered [ 4.630778] alua: device handler registered [ 4.635010] libphy: Fixed MDIO Bus: probed [ 4.639169] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 4.645707] ehci-pci: EHCI PCI platform driver [ 4.650175] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 4.656366] ohci-pci: OHCI PCI platform driver [ 4.660832] uhci_hcd: USB Universal Host Controller Interface driver [ 4.667303] xhci_hcd 0000:02:00.3: xHCI Host Controller [ 4.672593] xhci_hcd 0000:02:00.3: new USB bus registered, assigned bus number 1 [ 4.680102] xhci_hcd 0000:02:00.3: hcc params 0x0270f665 hci version 0x100 quirks 0x00000410 [ 4.688579] xhci_hcd 0000:02:00.3: irq 50 for MSI/MSI-X [ 4.688605] xhci_hcd 0000:02:00.3: irq 51 for MSI/MSI-X [ 4.688623] xhci_hcd 0000:02:00.3: irq 52 for MSI/MSI-X [ 4.688643] xhci_hcd 0000:02:00.3: irq 53 for MSI/MSI-X [ 4.688662] xhci_hcd 0000:02:00.3: irq 54 for MSI/MSI-X [ 4.688683] xhci_hcd 0000:02:00.3: irq 55 for MSI/MSI-X [ 4.688702] xhci_hcd 0000:02:00.3: irq 56 for MSI/MSI-X [ 4.688722] xhci_hcd 0000:02:00.3: irq 57 for MSI/MSI-X [ 4.688849] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 [ 4.695645] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 4.702871] usb usb1: Product: xHCI Host Controller [ 4.707761] usb usb1: Manufacturer: Linux 3.10.0-957.27.2.el7_lustre.pl1.x86_64 xhci-hcd [ 4.715855] usb usb1: SerialNumber: 0000:02:00.3 [ 4.720588] hub 1-0:1.0: USB hub found [ 4.724352] hub 1-0:1.0: 2 ports detected [ 4.728593] xhci_hcd 0000:02:00.3: xHCI Host Controller [ 4.733881] xhci_hcd 0000:02:00.3: new USB bus registered, assigned bus number 2 [ 4.741297] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 4.749409] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003 [ 4.756209] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 4.763436] usb usb2: Product: xHCI Host Controller [ 4.768326] usb usb2: Manufacturer: Linux 3.10.0-957.27.2.el7_lustre.pl1.x86_64 xhci-hcd [ 4.776420] usb usb2: SerialNumber: 0000:02:00.3 [ 4.781140] hub 2-0:1.0: USB hub found [ 4.784913] hub 2-0:1.0: 2 ports detected [ 4.789230] xhci_hcd 0000:41:00.3: xHCI Host Controller [ 4.794536] xhci_hcd 0000:41:00.3: new USB bus registered, assigned bus number 3 [ 4.802041] xhci_hcd 0000:41:00.3: hcc params 0x0270f665 hci version 0x100 quirks 0x00000410 [ 4.810524] xhci_hcd 0000:41:00.3: irq 59 for MSI/MSI-X [ 4.810542] xhci_hcd 0000:41:00.3: irq 60 for MSI/MSI-X [ 4.810561] xhci_hcd 0000:41:00.3: irq 61 for MSI/MSI-X [ 4.810580] xhci_hcd 0000:41:00.3: irq 62 for MSI/MSI-X [ 4.810599] xhci_hcd 0000:41:00.3: irq 63 for MSI/MSI-X [ 4.810623] xhci_hcd 0000:41:00.3: irq 64 for MSI/MSI-X [ 4.810642] xhci_hcd 0000:41:00.3: irq 65 for MSI/MSI-X [ 4.810663] xhci_hcd 0000:41:00.3: irq 66 for MSI/MSI-X [ 4.810812] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002 [ 4.817605] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 4.824834] usb usb3: Product: xHCI Host Controller [ 4.829721] usb usb3: Manufacturer: Linux 3.10.0-957.27.2.el7_lustre.pl1.x86_64 xhci-hcd [ 4.837816] usb usb3: SerialNumber: 0000:41:00.3 [ 4.842550] hub 3-0:1.0: USB hub found [ 4.846315] hub 3-0:1.0: 2 ports detected [ 4.850580] xhci_hcd 0000:41:00.3: xHCI Host Controller [ 4.855860] xhci_hcd 0000:41:00.3: new USB bus registered, assigned bus number 4 [ 4.863301] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM. [ 4.871406] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003 [ 4.878207] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 4.885432] usb usb4: Product: xHCI Host Controller [ 4.890320] usb usb4: Manufacturer: Linux 3.10.0-957.27.2.el7_lustre.pl1.x86_64 xhci-hcd [ 4.898415] usb usb4: SerialNumber: 0000:41:00.3 [ 4.903128] hub 4-0:1.0: USB hub found [ 4.906894] hub 4-0:1.0: 2 ports detected [ 4.911159] usbcore: registered new interface driver usbserial_generic [ 4.917705] usbserial: USB Serial support registered for generic [ 4.923755] i8042: PNP: No PS/2 controller found. Probing ports directly. [ 5.161928] usb 3-1: new high-speed USB device number 2 using xhci_hcd [ 5.291937] usb 3-1: New USB device found, idVendor=1604, idProduct=10c0 [ 5.298644] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 5.311452] hub 3-1:1.0: USB hub found [ 5.315437] hub 3-1:1.0: 4 ports detected [ 5.961678] i8042: No controller found [ 5.965451] sched: RT throttling activated [ 5.965460] tsc: Refined TSC clocksource calibration: 1996.249 MHz [ 5.965600] mousedev: PS/2 mouse device common for all mice [ 5.965817] rtc_cmos 00:01: RTC can wake from S4 [ 5.966200] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0 [ 5.966301] rtc_cmos 00:01: alarms up to one month, y3k, 114 bytes nvram, hpet irqs [ 5.966355] cpuidle: using governor menu [ 5.966605] EFI Variables Facility v0.08 2004-May-17 [ 5.988285] hidraw: raw HID events driver (C) Jiri Kosina [ 5.988390] usbcore: registered new interface driver usbhid [ 5.988390] usbhid: USB HID core driver [ 5.988474] drop_monitor: Initializing network drop monitor service [ 5.988618] TCP: cubic registered [ 5.988622] Initializing XFRM netlink socket [ 5.988822] NET: Registered protocol family 10 [ 5.989301] NET: Registered protocol family 17 [ 5.989306] mpls_gso: MPLS GSO support [ 5.990357] mce: Using 23 MCE banks [ 5.990403] microcode: CPU0: patch_level=0x08001250 [ 5.990415] microcode: CPU1: patch_level=0x08001250 [ 5.990425] microcode: CPU2: patch_level=0x08001250 [ 5.990435] microcode: CPU3: patch_level=0x08001250 [ 5.990449] microcode: CPU4: patch_level=0x08001250 [ 5.990462] microcode: CPU5: patch_level=0x08001250 [ 5.990476] microcode: CPU6: patch_level=0x08001250 [ 5.990490] microcode: CPU7: patch_level=0x08001250 [ 5.990498] microcode: CPU8: patch_level=0x08001250 [ 5.990507] microcode: CPU9: patch_level=0x08001250 [ 5.990518] microcode: CPU10: patch_level=0x08001250 [ 5.990528] microcode: CPU11: patch_level=0x08001250 [ 5.990539] microcode: CPU12: patch_level=0x08001250 [ 5.990548] microcode: CPU13: patch_level=0x08001250 [ 5.990559] microcode: CPU14: patch_level=0x08001250 [ 5.990569] microcode: CPU15: patch_level=0x08001250 [ 5.994123] microcode: CPU16: patch_level=0x08001250 [ 5.994134] microcode: CPU17: patch_level=0x08001250 [ 5.994145] microcode: CPU18: patch_level=0x08001250 [ 5.994155] microcode: CPU19: patch_level=0x08001250 [ 5.994166] microcode: CPU20: patch_level=0x08001250 [ 5.994177] microcode: CPU21: patch_level=0x08001250 [ 5.994188] microcode: CPU22: patch_level=0x08001250 [ 5.994198] microcode: CPU23: patch_level=0x08001250 [ 5.994206] microcode: CPU24: patch_level=0x08001250 [ 5.994214] microcode: CPU25: patch_level=0x08001250 [ 5.994222] microcode: CPU26: patch_level=0x08001250 [ 5.994230] microcode: CPU27: patch_level=0x08001250 [ 5.994238] microcode: CPU28: patch_level=0x08001250 [ 5.994246] microcode: CPU29: patch_level=0x08001250 [ 5.994257] microcode: CPU30: patch_level=0x08001250 [ 5.994268] microcode: CPU31: patch_level=0x08001250 [ 5.994275] microcode: CPU32: patch_level=0x08001250 [ 5.994284] microcode: CPU33: patch_level=0x08001250 [ 5.994295] microcode: CPU34: patch_level=0x08001250 [ 5.994303] microcode: CPU35: patch_level=0x08001250 [ 5.994310] microcode: CPU36: patch_level=0x08001250 [ 5.994319] microcode: CPU37: patch_level=0x08001250 [ 5.994327] microcode: CPU38: patch_level=0x08001250 [ 5.994337] microcode: CPU39: patch_level=0x08001250 [ 5.994345] microcode: CPU40: patch_level=0x08001250 [ 5.994354] microcode: CPU41: patch_level=0x08001250 [ 5.994362] microcode: CPU42: patch_level=0x08001250 [ 5.994373] microcode: CPU43: patch_level=0x08001250 [ 5.994381] microcode: CPU44: patch_level=0x08001250 [ 5.994390] microcode: CPU45: patch_level=0x08001250 [ 5.994401] microcode: CPU46: patch_level=0x08001250 [ 5.994411] microcode: CPU47: patch_level=0x08001250 [ 5.994461] microcode: Microcode Update Driver: v2.01 , Peter Oruba [ 5.994610] PM: Hibernation image not present or could not be loaded. [ 5.994614] Loading compiled-in X.509 certificates [ 5.994638] Loaded X.509 cert 'CentOS Linux kpatch signing key: ea0413152cde1d98ebdca3fe6f0230904c9ef717' [ 5.994651] Loaded X.509 cert 'CentOS Linux Driver update signing key: 7f421ee0ab69461574bb358861dbe77762a4201b' [ 5.994948] usb 3-1.1: new high-speed USB device number 3 using xhci_hcd [ 5.995028] Loaded X.509 cert 'CentOS Linux kernel signing key: 9e53aba22e464fccc5bb7396174083706426f6e2' [ 5.995044] registered taskstats version 1 [ 5.997154] Key type trusted registered [ 5.998695] Key type encrypted registered [ 5.998739] IMA: No TPM chip found, activating TPM-bypass! (rc=-19) [ 6.000242] Magic number: 11:771:861 [ 6.000301] cpuid cpu21: hash matches [ 6.000354] processor cpu21: hash matches [ 6.000376] memory memory1776: hash matches [ 6.000418] memory memory989: hash matches [ 6.000449] memory memory308: hash matches [ 6.007910] rtc_cmos 00:01: setting system clock to 2019-11-04 01:53:00 UTC (1572832380) [ 6.399576] Switched to clocksource tsc [ 6.404549] Freeing unused kernel memory: 1876k freed [ 6.409832] Write protecting the kernel read-only data: 12288k [ 6.410964] usb 3-1.1: New USB device found, idVendor=1604, idProduct=10c0 [ 6.410965] usb 3-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 6.431277] Freeing unused kernel memory: 504k freed [ 6.431483] hub 3-1.1:1.0: USB hub found [ 6.431836] hub 3-1.1:1.0: 4 ports detected [ 6.445752] Freeing unused kernel memory: 596k freed [ 6.495946] usb 3-1.4: new high-speed USB device number 4 using xhci_hcd [ 6.506058] systemd[1]: systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN) [ 6.514947] usb 1-1: new high-speed USB device number 2 using xhci_hcd [ 6.531674] systemd[1]: Detected architecture x86-64. [ 6.536729] systemd[1]: Running in initial RAM disk. [ 6.550092] systemd[1]: Set hostname to . [ 6.576970] usb 3-1.4: New USB device found, idVendor=1604, idProduct=10c0 [ 6.583852] usb 3-1.4: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 6.585428] systemd[1]: Reached target Timers. [ 6.600256] systemd[1]: Created slice Root Slice. [ 6.610051] systemd[1]: Listening on udev Control Socket. [ 6.621037] systemd[1]: Listening on Journal Socket. [ 6.623487] hub 3-1.4:1.0: USB hub found [ 6.623842] hub 3-1.4:1.0: 4 ports detected [ 6.639059] systemd[1]: Created slice System Slice. [ 6.647849] usb 1-1: New USB device found, idVendor=0424, idProduct=2744 [ 6.654842] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 6.661974] usb 1-1: Product: USB2734 [ 6.665637] usb 1-1: Manufacturer: Microchip Tech [ 6.670979] systemd[1]: Starting Create list of required static device nodes for the current kernel... [ 6.689474] systemd[1]: Starting Apply Kernel Variables... [ 6.695127] hub 1-1:1.0: USB hub found [ 6.699850] hub 1-1:1.0: 4 ports detected [ 6.708029] systemd[1]: Listening on udev Kernel Socket. [ 6.719004] systemd[1]: Reached target Local File Systems. [ 6.730454] systemd[1]: Starting dracut cmdline hook... [ 6.740369] systemd[1]: Starting Setup Virtual Console... [ 6.751003] systemd[1]: Reached target Sockets. [ 6.758975] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd [ 6.774102] usb 2-1: New USB device found, idVendor=0424, idProduct=5744 [ 6.774104] usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=0 [ 6.774105] usb 2-1: Product: USB5734 [ 6.774106] usb 2-1: Manufacturer: Microchip Tech [ 6.797408] hub 2-1:1.0: USB hub found [ 6.801351] hub 2-1:1.0: 4 ports detected [ 6.807549] usb: port power management may be unreliable [ 6.815018] systemd[1]: Reached target Slices. [ 6.824005] systemd[1]: Reached target Swap. [ 6.833419] systemd[1]: Starting Journal Service... [ 6.844387] systemd[1]: Started Create list of required static device nodes for the current kernel. [ 6.863311] systemd[1]: Started Apply Kernel Variables. [ 6.874189] systemd[1]: Started dracut cmdline hook. [ 6.884206] systemd[1]: Started Setup Virtual Console. [ 6.895135] systemd[1]: Started Journal Service. [ 7.008504] pps_core: LinuxPPS API ver. 1 registered [ 7.013477] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti [ 7.026511] PTP clock support registered [ 7.032934] megasas: 07.705.02.00-rh1 [ 7.032962] mlx_compat: loading out-of-tree module taints kernel. [ 7.033118] libata version 3.00 loaded. [ 7.044821] mlx_compat: module verification failed: signature and/or required key missing - tainting kernel [ 7.054912] megaraid_sas 0000:c1:00.0: FW now in Ready state [ 7.060584] megaraid_sas 0000:c1:00.0: 64 bit DMA mask and 32 bit consistent mask [ 7.069261] megaraid_sas 0000:c1:00.0: irq 68 for MSI/MSI-X [ 7.069290] megaraid_sas 0000:c1:00.0: irq 69 for MSI/MSI-X [ 7.069316] megaraid_sas 0000:c1:00.0: irq 70 for MSI/MSI-X [ 7.069343] megaraid_sas 0000:c1:00.0: irq 71 for MSI/MSI-X [ 7.069369] megaraid_sas 0000:c1:00.0: irq 72 for MSI/MSI-X [ 7.069395] megaraid_sas 0000:c1:00.0: irq 73 for MSI/MSI-X [ 7.069423] megaraid_sas 0000:c1:00.0: irq 74 for MSI/MSI-X [ 7.069449] megaraid_sas 0000:c1:00.0: irq 75 for MSI/MSI-X [ 7.069478] megaraid_sas 0000:c1:00.0: irq 76 for MSI/MSI-X [ 7.069504] megaraid_sas 0000:c1:00.0: irq 77 for MSI/MSI-X [ 7.069529] megaraid_sas 0000:c1:00.0: irq 78 for MSI/MSI-X [ 7.069555] megaraid_sas 0000:c1:00.0: irq 79 for MSI/MSI-X [ 7.069590] megaraid_sas 0000:c1:00.0: irq 80 for MSI/MSI-X [ 7.069616] megaraid_sas 0000:c1:00.0: irq 81 for MSI/MSI-X [ 7.069642] megaraid_sas 0000:c1:00.0: irq 82 for MSI/MSI-X [ 7.069667] megaraid_sas 0000:c1:00.0: irq 83 for MSI/MSI-X [ 7.069694] megaraid_sas 0000:c1:00.0: irq 84 for MSI/MSI-X [ 7.069719] megaraid_sas 0000:c1:00.0: irq 85 for MSI/MSI-X [ 7.069745] megaraid_sas 0000:c1:00.0: irq 86 for MSI/MSI-X [ 7.069769] megaraid_sas 0000:c1:00.0: irq 87 for MSI/MSI-X [ 7.069800] megaraid_sas 0000:c1:00.0: irq 88 for MSI/MSI-X [ 7.069826] megaraid_sas 0000:c1:00.0: irq 89 for MSI/MSI-X [ 7.069852] megaraid_sas 0000:c1:00.0: irq 90 for MSI/MSI-X [ 7.069877] megaraid_sas 0000:c1:00.0: irq 91 for MSI/MSI-X [ 7.069910] megaraid_sas 0000:c1:00.0: irq 92 for MSI/MSI-X [ 7.069938] megaraid_sas 0000:c1:00.0: irq 93 for MSI/MSI-X [ 7.069972] megaraid_sas 0000:c1:00.0: irq 94 for MSI/MSI-X [ 7.069998] megaraid_sas 0000:c1:00.0: irq 95 for MSI/MSI-X [ 7.070021] megaraid_sas 0000:c1:00.0: irq 96 for MSI/MSI-X [ 7.070042] megaraid_sas 0000:c1:00.0: irq 97 for MSI/MSI-X [ 7.070066] megaraid_sas 0000:c1:00.0: irq 98 for MSI/MSI-X [ 7.070088] megaraid_sas 0000:c1:00.0: irq 99 for MSI/MSI-X [ 7.070110] megaraid_sas 0000:c1:00.0: irq 100 for MSI/MSI-X [ 7.070135] megaraid_sas 0000:c1:00.0: irq 101 for MSI/MSI-X [ 7.070156] megaraid_sas 0000:c1:00.0: irq 102 for MSI/MSI-X [ 7.070178] megaraid_sas 0000:c1:00.0: irq 103 for MSI/MSI-X [ 7.070203] megaraid_sas 0000:c1:00.0: irq 104 for MSI/MSI-X [ 7.070226] megaraid_sas 0000:c1:00.0: irq 105 for MSI/MSI-X [ 7.070250] megaraid_sas 0000:c1:00.0: irq 106 for MSI/MSI-X [ 7.070283] megaraid_sas 0000:c1:00.0: irq 107 for MSI/MSI-X [ 7.070311] megaraid_sas 0000:c1:00.0: irq 108 for MSI/MSI-X [ 7.070339] megaraid_sas 0000:c1:00.0: irq 109 for MSI/MSI-X [ 7.070370] megaraid_sas 0000:c1:00.0: irq 110 for MSI/MSI-X [ 7.070395] megaraid_sas 0000:c1:00.0: irq 111 for MSI/MSI-X [ 7.070419] megaraid_sas 0000:c1:00.0: irq 112 for MSI/MSI-X [ 7.070441] megaraid_sas 0000:c1:00.0: irq 113 for MSI/MSI-X [ 7.070465] megaraid_sas 0000:c1:00.0: irq 114 for MSI/MSI-X [ 7.070490] megaraid_sas 0000:c1:00.0: irq 115 for MSI/MSI-X [ 7.070644] megaraid_sas 0000:c1:00.0: firmware supports msix : (96) [ 7.077022] megaraid_sas 0000:c1:00.0: current msix/online cpus : (48/48) [ 7.077023] megaraid_sas 0000:c1:00.0: RDPQ mode : (disabled) [ 7.077027] megaraid_sas 0000:c1:00.0: Current firmware supports maximum commands: 928 LDIO threshold: 237 [ 7.077326] megaraid_sas 0000:c1:00.0: Configured max firmware commands: 927 [ 7.079775] megaraid_sas 0000:c1:00.0: FW supports sync cache : No [ 7.113249] tg3.c:v3.137 (May 11, 2014) [ 7.117924] Compat-mlnx-ofed backport release: 1c4bf42 [ 7.123085] Backport based on mlnx_ofed/mlnx-ofa_kernel-4.0.git 1c4bf42 [ 7.129720] compat.git: mlnx_ofed/mlnx-ofa_kernel-4.0.git [ 7.130139] tg3 0000:81:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 4c:d9:8f:48:5a:bf [ 7.130142] tg3 0000:81:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) [ 7.130144] tg3 0000:81:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] [ 7.130146] tg3 0000:81:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit] [ 7.152775] tg3 0000:81:00.1 eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 4c:d9:8f:48:5a:c0 [ 7.152778] tg3 0000:81:00.1 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) [ 7.152780] tg3 0000:81:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] [ 7.152782] tg3 0000:81:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit] [ 7.218452] mpt3sas version 31.00.00.00 loaded [ 7.224108] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (263564432 kB) [ 7.236160] ahci 0000:86:00.2: version 3.0 [ 7.236840] ahci 0000:86:00.2: irq 120 for MSI/MSI-X [ 7.236847] ahci 0000:86:00.2: irq 121 for MSI/MSI-X [ 7.236852] ahci 0000:86:00.2: irq 122 for MSI/MSI-X [ 7.236857] ahci 0000:86:00.2: irq 123 for MSI/MSI-X [ 7.236861] ahci 0000:86:00.2: irq 124 for MSI/MSI-X [ 7.236866] ahci 0000:86:00.2: irq 125 for MSI/MSI-X [ 7.236870] ahci 0000:86:00.2: irq 126 for MSI/MSI-X [ 7.236874] ahci 0000:86:00.2: irq 127 for MSI/MSI-X [ 7.236879] ahci 0000:86:00.2: irq 128 for MSI/MSI-X [ 7.236883] ahci 0000:86:00.2: irq 129 for MSI/MSI-X [ 7.236888] ahci 0000:86:00.2: irq 130 for MSI/MSI-X [ 7.236892] ahci 0000:86:00.2: irq 131 for MSI/MSI-X [ 7.236897] ahci 0000:86:00.2: irq 132 for MSI/MSI-X [ 7.236901] ahci 0000:86:00.2: irq 133 for MSI/MSI-X [ 7.236906] ahci 0000:86:00.2: irq 134 for MSI/MSI-X [ 7.236910] ahci 0000:86:00.2: irq 135 for MSI/MSI-X [ 7.236980] ahci 0000:86:00.2: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode [ 7.246473] ahci 0000:86:00.2: flags: 64bit ncq sntf ilck pm led clo only pmp fbs pio slum part [ 7.258555] scsi host2: ahci [ 7.262565] ata1: SATA max UDMA/133 abar m4096@0xc0a02000 port 0xc0a02100 irq 120 [ 7.275857] mlx5_core 0000:01:00.0: firmware version: 20.26.1040 [ 7.282878] mlx5_core 0000:01:00.0: 126.016 Gb/s available PCIe bandwidth, limited by 8 GT/s x16 link at 0000:00:03.1 (capable of 252.048 Gb/s with 16 GT/s x16 link) [ 7.312048] mpt3sas_cm0: IOC Number : 0 [ 7.316616] mpt3sas 0000:84:00.0: irq 137 for MSI/MSI-X [ 7.316648] mpt3sas 0000:84:00.0: irq 138 for MSI/MSI-X [ 7.316673] mpt3sas 0000:84:00.0: irq 139 for MSI/MSI-X [ 7.316700] mpt3sas 0000:84:00.0: irq 140 for MSI/MSI-X [ 7.316723] mpt3sas 0000:84:00.0: irq 141 for MSI/MSI-X [ 7.316747] mpt3sas 0000:84:00.0: irq 142 for MSI/MSI-X [ 7.316773] mpt3sas 0000:84:00.0: irq 143 for MSI/MSI-X [ 7.316798] mpt3sas 0000:84:00.0: irq 144 for MSI/MSI-X [ 7.316822] mpt3sas 0000:84:00.0: irq 145 for MSI/MSI-X [ 7.316848] mpt3sas 0000:84:00.0: irq 146 for MSI/MSI-X [ 7.316871] mpt3sas 0000:84:00.0: irq 147 for MSI/MSI-X [ 7.316893] mpt3sas 0000:84:00.0: irq 148 for MSI/MSI-X [ 7.316916] mpt3sas 0000:84:00.0: irq 149 for MSI/MSI-X [ 7.316937] mpt3sas 0000:84:00.0: irq 150 for MSI/MSI-X [ 7.316966] mpt3sas 0000:84:00.0: irq 151 for MSI/MSI-X [ 7.316989] mpt3sas 0000:84:00.0: irq 152 for MSI/MSI-X [ 7.317014] mpt3sas 0000:84:00.0: irq 153 for MSI/MSI-X [ 7.317044] mpt3sas 0000:84:00.0: irq 154 for MSI/MSI-X [ 7.317068] mpt3sas 0000:84:00.0: irq 155 for MSI/MSI-X [ 7.317089] mpt3sas 0000:84:00.0: irq 156 for MSI/MSI-X [ 7.317111] mpt3sas 0000:84:00.0: irq 157 for MSI/MSI-X [ 7.317133] mpt3sas 0000:84:00.0: irq 158 for MSI/MSI-X [ 7.317155] mpt3sas 0000:84:00.0: irq 159 for MSI/MSI-X [ 7.317179] mpt3sas 0000:84:00.0: irq 160 for MSI/MSI-X [ 7.317202] mpt3sas 0000:84:00.0: irq 161 for MSI/MSI-X [ 7.317227] mpt3sas 0000:84:00.0: irq 162 for MSI/MSI-X [ 7.317250] mpt3sas 0000:84:00.0: irq 163 for MSI/MSI-X [ 7.317274] mpt3sas 0000:84:00.0: irq 164 for MSI/MSI-X [ 7.317297] mpt3sas 0000:84:00.0: irq 165 for MSI/MSI-X [ 7.317322] mpt3sas 0000:84:00.0: irq 166 for MSI/MSI-X [ 7.317344] mpt3sas 0000:84:00.0: irq 167 for MSI/MSI-X [ 7.317366] mpt3sas 0000:84:00.0: irq 168 for MSI/MSI-X [ 7.317389] mpt3sas 0000:84:00.0: irq 169 for MSI/MSI-X [ 7.317414] mpt3sas 0000:84:00.0: irq 170 for MSI/MSI-X [ 7.317436] mpt3sas 0000:84:00.0: irq 171 for MSI/MSI-X [ 7.317460] mpt3sas 0000:84:00.0: irq 172 for MSI/MSI-X [ 7.317482] mpt3sas 0000:84:00.0: irq 173 for MSI/MSI-X [ 7.317504] mpt3sas 0000:84:00.0: irq 174 for MSI/MSI-X [ 7.317529] mpt3sas 0000:84:00.0: irq 175 for MSI/MSI-X [ 7.317552] mpt3sas 0000:84:00.0: irq 176 for MSI/MSI-X [ 7.317577] mpt3sas 0000:84:00.0: irq 177 for MSI/MSI-X [ 7.317603] mpt3sas 0000:84:00.0: irq 178 for MSI/MSI-X [ 7.317628] mpt3sas 0000:84:00.0: irq 179 for MSI/MSI-X [ 7.317657] mpt3sas 0000:84:00.0: irq 180 for MSI/MSI-X [ 7.317682] mpt3sas 0000:84:00.0: irq 181 for MSI/MSI-X [ 7.317704] mpt3sas 0000:84:00.0: irq 182 for MSI/MSI-X [ 7.317727] mpt3sas 0000:84:00.0: irq 183 for MSI/MSI-X [ 7.317753] mpt3sas 0000:84:00.0: irq 184 for MSI/MSI-X [ 7.320038] mpt3sas0-msix0: PCI-MSI-X enabled: IRQ 137 [ 7.325188] mpt3sas0-msix1: PCI-MSI-X enabled: IRQ 138 [ 7.330334] mpt3sas0-msix2: PCI-MSI-X enabled: IRQ 139 [ 7.335482] mpt3sas0-msix3: PCI-MSI-X enabled: IRQ 140 [ 7.340631] mpt3sas0-msix4: PCI-MSI-X enabled: IRQ 141 [ 7.340632] mpt3sas0-msix5: PCI-MSI-X enabled: IRQ 142 [ 7.340633] mpt3sas0-msix6: PCI-MSI-X enabled: IRQ 143 [ 7.340633] mpt3sas0-msix7: PCI-MSI-X enabled: IRQ 144 [ 7.340634] mpt3sas0-msix8: PCI-MSI-X enabled: IRQ 145 [ 7.340635] mpt3sas0-msix9: PCI-MSI-X enabled: IRQ 146 [ 7.340638] mpt3sas0-msix10: PCI-MSI-X enabled: IRQ 147 [ 7.340639] mpt3sas0-msix11: PCI-MSI-X enabled: IRQ 148 [ 7.340640] mpt3sas0-msix12: PCI-MSI-X enabled: IRQ 149 [ 7.340641] mpt3sas0-msix13: PCI-MSI-X enabled: IRQ 150 [ 7.340641] mpt3sas0-msix14: PCI-MSI-X enabled: IRQ 151 [ 7.340642] mpt3sas0-msix15: PCI-MSI-X enabled: IRQ 152 [ 7.340643] mpt3sas0-msix16: PCI-MSI-X enabled: IRQ 153 [ 7.340643] mpt3sas0-msix17: PCI-MSI-X enabled: IRQ 154 [ 7.340644] mpt3sas0-msix18: PCI-MSI-X enabled: IRQ 155 [ 7.340644] mpt3sas0-msix19: PCI-MSI-X enabled: IRQ 156 [ 7.340645] mpt3sas0-msix20: PCI-MSI-X enabled: IRQ 157 [ 7.340645] mpt3sas0-msix21: PCI-MSI-X enabled: IRQ 158 [ 7.340646] mpt3sas0-msix22: PCI-MSI-X enabled: IRQ 159 [ 7.340646] mpt3sas0-msix23: PCI-MSI-X enabled: IRQ 160 [ 7.340647] mpt3sas0-msix24: PCI-MSI-X enabled: IRQ 161 [ 7.340647] mpt3sas0-msix25: PCI-MSI-X enabled: IRQ 162 [ 7.340648] mpt3sas0-msix26: PCI-MSI-X enabled: IRQ 163 [ 7.340648] mpt3sas0-msix27: PCI-MSI-X enabled: IRQ 164 [ 7.340649] mpt3sas0-msix28: PCI-MSI-X enabled: IRQ 165 [ 7.340649] mpt3sas0-msix29: PCI-MSI-X enabled: IRQ 166 [ 7.340650] mpt3sas0-msix30: PCI-MSI-X enabled: IRQ 167 [ 7.340650] mpt3sas0-msix31: PCI-MSI-X enabled: IRQ 168 [ 7.340651] mpt3sas0-msix32: PCI-MSI-X enabled: IRQ 169 [ 7.340651] mpt3sas0-msix33: PCI-MSI-X enabled: IRQ 170 [ 7.340652] mpt3sas0-msix34: PCI-MSI-X enabled: IRQ 171 [ 7.340652] mpt3sas0-msix35: PCI-MSI-X enabled: IRQ 172 [ 7.340653] mpt3sas0-msix36: PCI-MSI-X enabled: IRQ 173 [ 7.340654] mpt3sas0-msix37: PCI-MSI-X enabled: IRQ 174 [ 7.340654] mpt3sas0-msix38: PCI-MSI-X enabled: IRQ 175 [ 7.340655] mpt3sas0-msix39: PCI-MSI-X enabled: IRQ 176 [ 7.340655] mpt3sas0-msix40: PCI-MSI-X enabled: IRQ 177 [ 7.340656] mpt3sas0-msix41: PCI-MSI-X enabled: IRQ 178 [ 7.340656] mpt3sas0-msix42: PCI-MSI-X enabled: IRQ 179 [ 7.340657] mpt3sas0-msix43: PCI-MSI-X enabled: IRQ 180 [ 7.340657] mpt3sas0-msix44: PCI-MSI-X enabled: IRQ 181 [ 7.340658] mpt3sas0-msix45: PCI-MSI-X enabled: IRQ 182 [ 7.340658] mpt3sas0-msix46: PCI-MSI-X enabled: IRQ 183 [ 7.340659] mpt3sas0-msix47: PCI-MSI-X enabled: IRQ 184 [ 7.340661] mpt3sas_cm0: iomem(0x00000000ac000000), mapped(0xffffbc2b5a000000), size(1048576) [ 7.340662] mpt3sas_cm0: ioport(0x0000000000008000), size(256) [ 7.416965] mpt3sas_cm0: IOC Number : 0 [ 7.416968] mpt3sas_cm0: sending message unit reset !! [ 7.418964] mpt3sas_cm0: message unit reset: SUCCESS [ 7.438972] megaraid_sas 0000:c1:00.0: Init cmd return status SUCCESS for SCSI host 0 [ 7.459968] megaraid_sas 0000:c1:00.0: firmware type : Legacy(64 VD) firmware [ 7.459969] megaraid_sas 0000:c1:00.0: controller type : iMR(0MB) [ 7.459970] megaraid_sas 0000:c1:00.0: Online Controller Reset(OCR) : Enabled [ 7.459971] megaraid_sas 0000:c1:00.0: Secure JBOD support : No [ 7.459972] megaraid_sas 0000:c1:00.0: NVMe passthru support : No [ 7.481508] megaraid_sas 0000:c1:00.0: INIT adapter done [ 7.481510] megaraid_sas 0000:c1:00.0: Jbod map is not supported megasas_setup_jbod_map 5146 [ 7.507849] megaraid_sas 0000:c1:00.0: pci id : (0x1000)/(0x005f)/(0x1028)/(0x1f4b) [ 7.507851] megaraid_sas 0000:c1:00.0: unevenspan support : yes [ 7.507852] megaraid_sas 0000:c1:00.0: firmware crash dump : no [ 7.507853] megaraid_sas 0000:c1:00.0: jbod sync map : no [ 7.507857] scsi host0: Avago SAS based MegaRAID driver [ 7.526986] scsi 0:2:0:0: Direct-Access DELL PERC H330 Mini 4.30 PQ: 0 ANSI: 5 [ 7.556571] mlx5_core 0000:01:00.0: irq 185 for MSI/MSI-X [ 7.556592] mlx5_core 0000:01:00.0: irq 186 for MSI/MSI-X [ 7.556611] mlx5_core 0000:01:00.0: irq 187 for MSI/MSI-X [ 7.556630] mlx5_core 0000:01:00.0: irq 188 for MSI/MSI-X [ 7.556650] mlx5_core 0000:01:00.0: irq 189 for MSI/MSI-X [ 7.556669] mlx5_core 0000:01:00.0: irq 190 for MSI/MSI-X [ 7.556687] mlx5_core 0000:01:00.0: irq 191 for MSI/MSI-X [ 7.556706] mlx5_core 0000:01:00.0: irq 192 for MSI/MSI-X [ 7.556729] mlx5_core 0000:01:00.0: irq 193 for MSI/MSI-X [ 7.556748] mlx5_core 0000:01:00.0: irq 194 for MSI/MSI-X [ 7.556766] mlx5_core 0000:01:00.0: irq 195 for MSI/MSI-X [ 7.556783] mlx5_core 0000:01:00.0: irq 196 for MSI/MSI-X [ 7.556806] mlx5_core 0000:01:00.0: irq 197 for MSI/MSI-X [ 7.556825] mlx5_core 0000:01:00.0: irq 198 for MSI/MSI-X [ 7.556845] mlx5_core 0000:01:00.0: irq 199 for MSI/MSI-X [ 7.556862] mlx5_core 0000:01:00.0: irq 200 for MSI/MSI-X [ 7.556880] mlx5_core 0000:01:00.0: irq 201 for MSI/MSI-X [ 7.556899] mlx5_core 0000:01:00.0: irq 202 for MSI/MSI-X [ 7.556919] mlx5_core 0000:01:00.0: irq 203 for MSI/MSI-X [ 7.556936] mlx5_core 0000:01:00.0: irq 204 for MSI/MSI-X [ 7.556956] mlx5_core 0000:01:00.0: irq 205 for MSI/MSI-X [ 7.556982] mlx5_core 0000:01:00.0: irq 206 for MSI/MSI-X [ 7.557000] mlx5_core 0000:01:00.0: irq 207 for MSI/MSI-X [ 7.557019] mlx5_core 0000:01:00.0: irq 208 for MSI/MSI-X [ 7.557037] mlx5_core 0000:01:00.0: irq 209 for MSI/MSI-X [ 7.557055] mlx5_core 0000:01:00.0: irq 210 for MSI/MSI-X [ 7.557073] mlx5_core 0000:01:00.0: irq 211 for MSI/MSI-X [ 7.557091] mlx5_core 0000:01:00.0: irq 212 for MSI/MSI-X [ 7.557111] mlx5_core 0000:01:00.0: irq 213 for MSI/MSI-X [ 7.557129] mlx5_core 0000:01:00.0: irq 214 for MSI/MSI-X [ 7.557147] mlx5_core 0000:01:00.0: irq 215 for MSI/MSI-X [ 7.557167] mlx5_core 0000:01:00.0: irq 216 for MSI/MSI-X [ 7.557187] mlx5_core 0000:01:00.0: irq 217 for MSI/MSI-X [ 7.557205] mlx5_core 0000:01:00.0: irq 218 for MSI/MSI-X [ 7.557227] mlx5_core 0000:01:00.0: irq 219 for MSI/MSI-X [ 7.557246] mlx5_core 0000:01:00.0: irq 220 for MSI/MSI-X [ 7.557266] mlx5_core 0000:01:00.0: irq 221 for MSI/MSI-X [ 7.557284] mlx5_core 0000:01:00.0: irq 222 for MSI/MSI-X [ 7.557304] mlx5_core 0000:01:00.0: irq 223 for MSI/MSI-X [ 7.557321] mlx5_core 0000:01:00.0: irq 224 for MSI/MSI-X [ 7.557339] mlx5_core 0000:01:00.0: irq 225 for MSI/MSI-X [ 7.557358] mlx5_core 0000:01:00.0: irq 226 for MSI/MSI-X [ 7.557375] mlx5_core 0000:01:00.0: irq 227 for MSI/MSI-X [ 7.557393] mlx5_core 0000:01:00.0: irq 228 for MSI/MSI-X [ 7.557412] mlx5_core 0000:01:00.0: irq 229 for MSI/MSI-X [ 7.557429] mlx5_core 0000:01:00.0: irq 230 for MSI/MSI-X [ 7.557448] mlx5_core 0000:01:00.0: irq 231 for MSI/MSI-X [ 7.557467] mlx5_core 0000:01:00.0: irq 232 for MSI/MSI-X [ 7.557487] mlx5_core 0000:01:00.0: irq 233 for MSI/MSI-X [ 7.558527] mlx5_core 0000:01:00.0: Port module event: module 0, Cable plugged [ 7.558787] mlx5_core 0000:01:00.0: mlx5_pcie_event:303:(pid 315): PCIe slot advertised sufficient power (27W). [ 7.566109] mlx5_core 0000:01:00.0: mlx5_fw_tracer_start:776:(pid 317): FWTracer: Ownership granted and active [ 7.576985] ata1: SATA link down (SStatus 0 SControl 300) [ 7.588051] mpt3sas_cm0: Allocated physical memory: size(38831 kB) [ 7.588053] mpt3sas_cm0: Current Controller Queue Depth(7564), Max Controller Queue Depth(7680) [ 7.588054] mpt3sas_cm0: Scatter Gather Elements per IO(128) [ 7.731576] mpt3sas_cm0: FW Package Version(12.00.00.00) [ 7.731829] mpt3sas_cm0: SAS3616: FWVersion(12.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00) [ 7.731834] mpt3sas_cm0: Protocol=(Initiator,Target,NVMe), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ) [ 7.731902] mpt3sas 0000:84:00.0: Enabled Extended Tags as Controller Supports [ 7.731917] mpt3sas_cm0: : host protection capabilities enabled DIF1 DIF2 DIF3 [ 7.731928] scsi host1: Fusion MPT SAS Host [ 7.732189] mpt3sas_cm0: sending port enable !! [ 7.732474] mpt3sas_cm0: hba_port entry: ffff9a81b3961740, port: 255 is added to hba_port list [ 7.734944] mpt3sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b00deb48a0), phys(21) [ 7.735660] mpt3sas_cm0: detecting: handle(0x0018), sas_address(0x500a0984dfa1fa24), phy(0) [ 7.735665] mpt3sas_cm0: REPORT_LUNS: handle(0x0018), retries(0) [ 7.736555] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0018), lun(0) [ 7.737099] scsi 1:0:0:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 7.737179] scsi 1:0:0:0: SSP: handle(0x0018), sas_addr(0x500a0984dfa1fa24), phy(0), device_name(0x500a0984dfa1fa24) [ 7.737181] scsi 1:0:0:0: enclosure logical id(0x300605b00d1148a0), slot(13) [ 7.737182] scsi 1:0:0:0: enclosure level(0x0000), connector name( C3 ) [ 7.737183] scsi 1:0:0:0: serial_number(021825001369 ) [ 7.737186] scsi 1:0:0:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 7.929217] mlx5_ib: Mellanox Connect-IB Infiniband driver v4.7-1.0.0 [ 7.932961] scsi 1:0:0:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 7.933041] scsi 1:0:0:1: SSP: handle(0x0018), sas_addr(0x500a0984dfa1fa24), phy(0), device_name(0x500a0984dfa1fa24) [ 7.933043] scsi 1:0:0:1: enclosure logical id(0x300605b00d1148a0), slot(13) [ 7.933044] scsi 1:0:0:1: enclosure level(0x0000), connector name( C3 ) [ 7.933045] scsi 1:0:0:1: serial_number(021825001369 ) [ 7.933047] scsi 1:0:0:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 7.933287] scsi 1:0:0:1: Mode parameters changed [ 7.998206] scsi 1:0:0:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 [ 8.006476] scsi 1:0:0:31: SSP: handle(0x0018), sas_addr(0x500a0984dfa1fa24), phy(0), device_name(0x500a0984dfa1fa24) [ 8.011161] sd 0:2:0:0: [sda] 467664896 512-byte logical blocks: (239 GB/223 GiB) [ 8.011330] sd 0:2:0:0: [sda] Write Protect is off [ 8.011332] sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08 [ 8.011375] sd 0:2:0:0: [sda] Write cache: disabled, read cache: disabled, supports DPO and FUA [ 8.013491] sda: sda1 sda2 sda3 [ 8.013927] sd 0:2:0:0: [sda] Attached SCSI disk [ 8.045899] scsi 1:0:0:31: enclosure logical id(0x300605b00d1148a0), slot(13) [ 8.045900] scsi 1:0:0:31: enclosure level(0x0000), connector name( C3 ) [ 8.045954] scsi 1:0:0:31: serial_number(021825001369 ) [ 8.045957] scsi 1:0:0:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.094253] mpt3sas_cm0: detecting: handle(0x0019), sas_address(0x500a0984dfa20c10), phy(4) [ 8.103663] mpt3sas_cm0: REPORT_LUNS: handle(0x0019), retries(0) [ 8.111803] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0019), lun(0) [ 8.118582] scsi 1:0:1:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 8.126776] scsi 1:0:1:0: SSP: handle(0x0019), sas_addr(0x500a0984dfa20c10), phy(4), device_name(0x500a0984dfa20c10) [ 8.137290] scsi 1:0:1:0: enclosure logical id(0x300605b00d1148a0), slot(9) [ 8.144335] scsi 1:0:1:0: enclosure level(0x0000), connector name( C2 ) [ 8.146953] random: crng init done [ 8.154461] scsi 1:0:1:0: serial_number(021825001558 ) [ 8.159860] scsi 1:0:1:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.182796] scsi 1:0:1:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 8.190969] scsi 1:0:1:1: SSP: handle(0x0019), sas_addr(0x500a0984dfa20c10), phy(4), device_name(0x500a0984dfa20c10) [ 8.201485] scsi 1:0:1:1: enclosure logical id(0x300605b00d1148a0), slot(9) [ 8.208532] scsi 1:0:1:1: enclosure level(0x0000), connector name( C2 ) [ 8.215238] scsi 1:0:1:1: serial_number(021825001558 ) [ 8.220645] scsi 1:0:1:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.246199] scsi 1:0:1:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 [ 8.254468] scsi 1:0:1:31: SSP: handle(0x0019), sas_addr(0x500a0984dfa20c10), phy(4), device_name(0x500a0984dfa20c10) [ 8.265066] scsi 1:0:1:31: enclosure logical id(0x300605b00d1148a0), slot(9) [ 8.272198] scsi 1:0:1:31: enclosure level(0x0000), connector name( C2 ) [ 8.279006] scsi 1:0:1:31: serial_number(021825001558 ) [ 8.284491] scsi 1:0:1:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.304255] mpt3sas_cm0: detecting: handle(0x0017), sas_address(0x500a0984db2fa924), phy(8) [ 8.312610] mpt3sas_cm0: REPORT_LUNS: handle(0x0017), retries(0) [ 8.319641] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0017), lun(0) [ 8.326193] scsi 1:0:2:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 8.334375] scsi 1:0:2:0: SSP: handle(0x0017), sas_addr(0x500a0984db2fa924), phy(8), device_name(0x500a0984db2fa924) [ 8.344888] scsi 1:0:2:0: enclosure logical id(0x300605b00d1148a0), slot(5) [ 8.351934] scsi 1:0:2:0: enclosure level(0x0000), connector name( C1 ) [ 8.358651] scsi 1:0:2:0: serial_number(021815000354 ) [ 8.364054] scsi 1:0:2:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.384931] scsi 1:0:2:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 8.393093] scsi 1:0:2:1: SSP: handle(0x0017), sas_addr(0x500a0984db2fa924), phy(8), device_name(0x500a0984db2fa924) [ 8.403606] scsi 1:0:2:1: enclosure logical id(0x300605b00d1148a0), slot(5) [ 8.410652] scsi 1:0:2:1: enclosure level(0x0000), connector name( C1 ) [ 8.417370] scsi 1:0:2:1: serial_number(021815000354 ) [ 8.422772] scsi 1:0:2:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.431962] scsi 1:0:2:1: Mode parameters changed [ 8.447188] scsi 1:0:2:2: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 8.455372] scsi 1:0:2:2: SSP: handle(0x0017), sas_addr(0x500a0984db2fa924), phy(8), device_name(0x500a0984db2fa924) [ 8.465887] scsi 1:0:2:2: enclosure logical id(0x300605b00d1148a0), slot(5) [ 8.472933] scsi 1:0:2:2: enclosure level(0x0000), connector name( C1 ) [ 8.479652] scsi 1:0:2:2: serial_number(021815000354 ) [ 8.485052] scsi 1:0:2:2: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.494231] scsi 1:0:2:2: Mode parameters changed [ 8.513191] scsi 1:0:2:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 [ 8.521457] scsi 1:0:2:31: SSP: handle(0x0017), sas_addr(0x500a0984db2fa924), phy(8), device_name(0x500a0984db2fa924) [ 8.532059] scsi 1:0:2:31: enclosure logical id(0x300605b00d1148a0), slot(5) [ 8.539193] scsi 1:0:2:31: enclosure level(0x0000), connector name( C1 ) [ 8.546002] scsi 1:0:2:31: serial_number(021815000354 ) [ 8.551496] scsi 1:0:2:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.571258] mpt3sas_cm0: detecting: handle(0x001a), sas_address(0x500a0984da0f9b10), phy(12) [ 8.579696] mpt3sas_cm0: REPORT_LUNS: handle(0x001a), retries(0) [ 8.586521] mpt3sas_cm0: TEST_UNIT_READY: handle(0x001a), lun(0) [ 8.593077] scsi 1:0:3:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 8.601258] scsi 1:0:3:0: SSP: handle(0x001a), sas_addr(0x500a0984da0f9b10), phy(12), device_name(0x500a0984da0f9b10) [ 8.611855] scsi 1:0:3:0: enclosure logical id(0x300605b00d1148a0), slot(1) [ 8.618902] scsi 1:0:3:0: enclosure level(0x0000), connector name( C0 ) [ 8.625620] scsi 1:0:3:0: serial_number(021812047179 ) [ 8.631020] scsi 1:0:3:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.653828] scsi 1:0:3:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 8.661996] scsi 1:0:3:1: SSP: handle(0x001a), sas_addr(0x500a0984da0f9b10), phy(12), device_name(0x500a0984da0f9b10) [ 8.672592] scsi 1:0:3:1: enclosure logical id(0x300605b00d1148a0), slot(1) [ 8.679640] scsi 1:0:3:1: enclosure level(0x0000), connector name( C0 ) [ 8.686357] scsi 1:0:3:1: serial_number(021812047179 ) [ 8.691759] scsi 1:0:3:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.714214] scsi 1:0:3:2: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 8.722375] scsi 1:0:3:2: SSP: handle(0x001a), sas_addr(0x500a0984da0f9b10), phy(12), device_name(0x500a0984da0f9b10) [ 8.732975] scsi 1:0:3:2: enclosure logical id(0x300605b00d1148a0), slot(1) [ 8.740022] scsi 1:0:3:2: enclosure level(0x0000), connector name( C0 ) [ 8.746741] scsi 1:0:3:2: serial_number(021812047179 ) [ 8.752140] scsi 1:0:3:2: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.772212] scsi 1:0:3:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 [ 8.780461] scsi 1:0:3:31: SSP: handle(0x001a), sas_addr(0x500a0984da0f9b10), phy(12), device_name(0x500a0984da0f9b10) [ 8.791147] scsi 1:0:3:31: enclosure logical id(0x300605b00d1148a0), slot(1) [ 8.798280] scsi 1:0:3:31: enclosure level(0x0000), connector name( C0 ) [ 8.805086] scsi 1:0:3:31: serial_number(021812047179 ) [ 8.810573] scsi 1:0:3:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 8.833855] mpt3sas_cm0: detecting: handle(0x0011), sas_address(0x300705b00deb48a0), phy(16) [ 8.842296] mpt3sas_cm0: REPORT_LUNS: handle(0x0011), retries(0) [ 8.848322] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0011), lun(0) [ 8.854697] scsi 1:0:4:0: Enclosure LSI VirtualSES 03 PQ: 0 ANSI: 7 [ 8.862824] scsi 1:0:4:0: set ignore_delay_remove for handle(0x0011) [ 8.869176] scsi 1:0:4:0: SES: handle(0x0011), sas_addr(0x300705b00deb48a0), phy(16), device_name(0x300705b00deb48a0) [ 8.879776] scsi 1:0:4:0: enclosure logical id(0x300605b00d1148a0), slot(16) [ 8.886910] scsi 1:0:4:0: enclosure level(0x0000), connector name( C3 ) [ 8.893630] scsi 1:0:4:0: serial_number(300605B00D1148A0) [ 8.899029] scsi 1:0:4:0: qdepth(1), tagged(0), simple(0), ordered(0), scsi_level(8), cmd_que(0) [ 8.907834] mpt3sas_cm0: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206) [ 8.941001] mpt3sas_cm0: port enable: SUCCESS [ 8.945832] scsi 1:0:0:0: rdac: LUN 0 (IOSHIP) (owned) [ 8.951240] sd 1:0:0:0: [sdb] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 8.959298] scsi 1:0:0:1: rdac: LUN 1 (IOSHIP) (unowned) [ 8.964790] sd 1:0:0:0: [sdb] Write Protect is off [ 8.964870] sd 1:0:0:1: [sdc] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 8.965244] sd 1:0:0:1: [sdc] Write Protect is off [ 8.965246] sd 1:0:0:1: [sdc] Mode Sense: 83 00 10 08 [ 8.965291] scsi 1:0:1:0: rdac: LUN 0 (IOSHIP) (unowned) [ 8.965386] sd 1:0:0:1: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.965524] sd 1:0:1:0: [sdd] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 8.965833] scsi 1:0:1:1: rdac: LUN 1 (IOSHIP) (owned) [ 8.966061] sd 1:0:1:1: [sde] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 8.966128] sd 1:0:1:0: [sdd] Write Protect is off [ 8.966129] sd 1:0:1:0: [sdd] Mode Sense: 83 00 10 08 [ 8.966328] sd 1:0:1:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.966420] scsi 1:0:2:0: rdac: LUN 0 (IOSHIP) (owned) [ 8.966657] sd 1:0:2:0: [sdf] 926167040 512-byte logical blocks: (474 GB/441 GiB) [ 8.966658] sd 1:0:2:0: [sdf] 4096-byte physical blocks [ 8.966683] sd 1:0:1:1: [sde] Write Protect is off [ 8.966684] sd 1:0:1:1: [sde] Mode Sense: 83 00 10 08 [ 8.966953] scsi 1:0:2:1: rdac: LUN 1 (IOSHIP) (unowned) [ 8.967206] sd 1:0:1:1: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.967236] sd 1:0:2:0: [sdf] Write Protect is off [ 8.967238] sd 1:0:2:0: [sdf] Mode Sense: 83 00 10 08 [ 8.967258] sd 1:0:2:1: [sdg] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 8.967523] sd 1:0:2:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.967637] scsi 1:0:2:2: rdac: LUN 2 (IOSHIP) (owned) [ 8.967947] sd 1:0:2:2: [sdh] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 8.968048] sd 1:0:2:1: [sdg] Write Protect is off [ 8.968050] sd 1:0:2:1: [sdg] Mode Sense: 83 00 10 08 [ 8.968217] scsi 1:0:3:0: rdac: LUN 0 (IOSHIP) (unowned) [ 8.968320] sd 1:0:2:1: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.968416] sd 1:0:3:0: [sdi] 926167040 512-byte logical blocks: (474 GB/441 GiB) [ 8.968417] sd 1:0:3:0: [sdi] 4096-byte physical blocks [ 8.968730] sd 1:0:2:2: [sdh] Write Protect is off [ 8.968731] sd 1:0:2:2: [sdh] Mode Sense: 83 00 10 08 [ 8.968784] scsi 1:0:3:1: rdac: LUN 1 (IOSHIP) (owned) [ 8.968992] sd 1:0:3:0: [sdi] Write Protect is off [ 8.968993] sd 1:0:3:0: [sdi] Mode Sense: 83 00 10 08 [ 8.969034] sd 1:0:2:2: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.969041] sd 1:0:0:1: [sdc] Attached SCSI disk [ 8.969092] sd 1:0:3:1: [sdj] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 8.969244] sd 1:0:3:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.969549] scsi 1:0:3:2: rdac: LUN 2 (IOSHIP) (unowned) [ 8.970079] sd 1:0:3:2: [sdk] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 8.970084] sd 1:0:3:1: [sdj] Write Protect is off [ 8.970085] sd 1:0:3:1: [sdj] Mode Sense: 83 00 10 08 [ 8.970342] sd 1:0:3:1: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.970496] sd 1:0:1:1: [sde] Attached SCSI disk [ 8.970796] sd 1:0:1:0: [sdd] Attached SCSI disk [ 8.971123] sd 1:0:3:2: [sdk] Write Protect is off [ 8.971125] sd 1:0:3:2: [sdk] Mode Sense: 83 00 10 08 [ 8.971342] sd 1:0:3:2: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 8.971756] sd 1:0:2:0: [sdf] Attached SCSI disk [ 8.973050] sd 1:0:2:2: [sdh] Attached SCSI disk [ 8.973957] sd 1:0:2:1: [sdg] Attached SCSI disk [ 8.974419] sd 1:0:3:0: [sdi] Attached SCSI disk [ 8.974763] sd 1:0:3:1: [sdj] Attached SCSI disk [ 8.975659] sd 1:0:3:2: [sdk] Attached SCSI disk [ 9.253023] sd 1:0:0:0: [sdb] Mode Sense: 83 00 10 08 [ 9.253190] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 9.263751] sd 1:0:0:0: [sdb] Attached SCSI disk [ 9.325816] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null) [ 9.549986] systemd-journald[436]: Received SIGTERM from PID 1 (systemd). [ 9.590399] SELinux: Disabled at runtime. [ 9.595845] SELinux: Unregistering netfilter hooks [ 9.641027] type=1404 audit(1572832384.139:2): selinux=0 auid=4294967295 ses=4294967295 [ 9.669739] ip_tables: (C) 2000-2006 Netfilter Core Team [ 9.676453] systemd[1]: Inserted module 'ip_tables' [ 9.772722] EXT4-fs (sda2): re-mounted. Opts: (null) [ 9.790625] systemd-journald[4902]: Received request to flush runtime journal from PID 1 [ 9.866277] device-mapper: uevent: version 1.0.3 [ 9.873777] device-mapper: ioctl: 4.37.1-ioctl (2018-04-03) initialised: dm-devel@redhat.com [ 9.884852] ACPI Error: No handler for Region [SYSI] (ffff9a7269e7aa68) [IPMI] (20130517/evregion-162) [ 9.898136] ACPI Error: Region IPMI (ID=7) has no handler (20130517/exfldio-305) [ 9.909753] ACPI Error: Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff9a7269e775a0), AE_NOT_EXIST (20130517/psparse-536) [ 9.928691] ACPI Error: Method parse/execution failed [\_SB_.PMI0._PMC] (Node ffff9a7269e77500), AE_NOT_EXIST (20130517/psparse-536) [ 9.947639] ACPI Exception: AE_NOT_EXIST, Evaluating _PMC (20130517/power_meter-753) [ 9.960709] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0 [ 9.968426] piix4_smbus 0000:00:14.0: Using register 0x2e for SMBus port selection [ 9.978236] ipmi message handler version 39.2 [ 9.983954] ccp 0000:02:00.2: 3 command queues available [ 9.990013] ccp 0000:02:00.2: irq 235 for MSI/MSI-X [ 9.990045] ccp 0000:02:00.2: irq 236 for MSI/MSI-X [ 9.990197] ccp 0000:02:00.2: Queue 2 can access 4 LSB regions [ 9.997057] ccp 0000:02:00.2: Queue 3 can access 4 LSB regions [ 9.998198] input: PC Speaker as /devices/platform/pcspkr/input/input2 [ 10.012234] ccp 0000:02:00.2: Queue 4 can access 4 LSB regions [ 10.019607] ccp 0000:02:00.2: Queue 0 gets LSB 4 [ 10.025513] ccp 0000:02:00.2: Queue 1 gets LSB 5 [ 10.025570] ipmi device interface [ 10.036238] ccp 0000:02:00.2: Queue 2 gets LSB 6 [ 10.044692] sd 0:2:0:0: Attached scsi generic sg0 type 0 [ 10.050994] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 10.051051] cryptd: max_cpu_qlen set to 1000 [ 10.051450] ccp 0000:02:00.2: enabled [ 10.051696] ccp 0000:03:00.1: 5 command queues available [ 10.051764] ccp 0000:03:00.1: irq 238 for MSI/MSI-X [ 10.051798] ccp 0000:03:00.1: Queue 0 can access 7 LSB regions [ 10.051801] ccp 0000:03:00.1: Queue 1 can access 7 LSB regions [ 10.051803] ccp 0000:03:00.1: Queue 2 can access 7 LSB regions [ 10.051806] ccp 0000:03:00.1: Queue 3 can access 7 LSB regions [ 10.051808] ccp 0000:03:00.1: Queue 4 can access 7 LSB regions [ 10.051810] ccp 0000:03:00.1: Queue 0 gets LSB 1 [ 10.051811] ccp 0000:03:00.1: Queue 1 gets LSB 2 [ 10.051813] ccp 0000:03:00.1: Queue 2 gets LSB 3 [ 10.051814] ccp 0000:03:00.1: Queue 3 gets LSB 4 [ 10.051815] ccp 0000:03:00.1: Queue 4 gets LSB 5 [ 10.052501] ccp 0000:03:00.1: enabled [ 10.052709] ccp 0000:41:00.2: 3 command queues available [ 10.052760] ccp 0000:41:00.2: irq 240 for MSI/MSI-X [ 10.052781] ccp 0000:41:00.2: irq 241 for MSI/MSI-X [ 10.052828] ccp 0000:41:00.2: Queue 2 can access 4 LSB regions [ 10.052831] ccp 0000:41:00.2: Queue 3 can access 4 LSB regions [ 10.052833] ccp 0000:41:00.2: Queue 4 can access 4 LSB regions [ 10.052834] ccp 0000:41:00.2: Queue 0 gets LSB 4 [ 10.052836] ccp 0000:41:00.2: Queue 1 gets LSB 5 [ 10.052837] ccp 0000:41:00.2: Queue 2 gets LSB 6 [ 10.054611] ccp 0000:41:00.2: enabled [ 10.054769] ccp 0000:42:00.1: 5 command queues available [ 10.054822] ccp 0000:42:00.1: irq 243 for MSI/MSI-X [ 10.055148] ccp 0000:42:00.1: Queue 0 can access 7 LSB regions [ 10.055151] ccp 0000:42:00.1: Queue 1 can access 7 LSB regions [ 10.055153] ccp 0000:42:00.1: Queue 2 can access 7 LSB regions [ 10.055155] ccp 0000:42:00.1: Queue 3 can access 7 LSB regions [ 10.055156] ccp 0000:42:00.1: Queue 4 can access 7 LSB regions [ 10.055158] ccp 0000:42:00.1: Queue 0 gets LSB 1 [ 10.055159] ccp 0000:42:00.1: Queue 1 gets LSB 2 [ 10.055160] ccp 0000:42:00.1: Queue 2 gets LSB 3 [ 10.055160] ccp 0000:42:00.1: Queue 3 gets LSB 4 [ 10.055161] ccp 0000:42:00.1: Queue 4 gets LSB 5 [ 10.056425] ccp 0000:42:00.1: enabled [ 10.056683] ccp 0000:85:00.2: 3 command queues available [ 10.056739] ccp 0000:85:00.2: irq 245 for MSI/MSI-X [ 10.056764] ccp 0000:85:00.2: irq 246 for MSI/MSI-X [ 10.056833] ccp 0000:85:00.2: Queue 2 can access 4 LSB regions [ 10.056836] ccp 0000:85:00.2: Queue 3 can access 4 LSB regions [ 10.056838] ccp 0000:85:00.2: Queue 4 can access 4 LSB regions [ 10.056839] ccp 0000:85:00.2: Queue 0 gets LSB 4 [ 10.056840] ccp 0000:85:00.2: Queue 1 gets LSB 5 [ 10.056842] ccp 0000:85:00.2: Queue 2 gets LSB 6 [ 10.057469] ccp 0000:85:00.2: enabled [ 10.057627] ccp 0000:86:00.1: 5 command queues available [ 10.057678] ccp 0000:86:00.1: irq 248 for MSI/MSI-X [ 10.057706] ccp 0000:86:00.1: Queue 0 can access 7 LSB regions [ 10.057709] ccp 0000:86:00.1: Queue 1 can access 7 LSB regions [ 10.057711] ccp 0000:86:00.1: Queue 2 can access 7 LSB regions [ 10.057713] ccp 0000:86:00.1: Queue 3 can access 7 LSB regions [ 10.057715] ccp 0000:86:00.1: Queue 4 can access 7 LSB regions [ 10.057716] ccp 0000:86:00.1: Queue 0 gets LSB 1 [ 10.057717] ccp 0000:86:00.1: Queue 1 gets LSB 2 [ 10.057718] ccp 0000:86:00.1: Queue 2 gets LSB 3 [ 10.057720] ccp 0000:86:00.1: Queue 3 gets LSB 4 [ 10.057721] ccp 0000:86:00.1: Queue 4 gets LSB 5 [ 10.058223] ccp 0000:86:00.1: enabled [ 10.058662] ccp 0000:c2:00.2: 3 command queues available [ 10.058720] ccp 0000:c2:00.2: irq 250 for MSI/MSI-X [ 10.058746] ccp 0000:c2:00.2: irq 251 for MSI/MSI-X [ 10.058797] ccp 0000:c2:00.2: Queue 2 can access 4 LSB regions [ 10.058799] ccp 0000:c2:00.2: Queue 3 can access 4 LSB regions [ 10.058801] ccp 0000:c2:00.2: Queue 4 can access 4 LSB regions [ 10.058803] ccp 0000:c2:00.2: Queue 0 gets LSB 4 [ 10.058804] ccp 0000:c2:00.2: Queue 1 gets LSB 5 [ 10.058806] ccp 0000:c2:00.2: Queue 2 gets LSB 6 [ 10.059503] ccp 0000:c2:00.2: enabled [ 10.059630] ccp 0000:c3:00.1: 5 command queues available [ 10.059672] ccp 0000:c3:00.1: irq 253 for MSI/MSI-X [ 10.059695] ccp 0000:c3:00.1: Queue 0 can access 7 LSB regions [ 10.059696] ccp 0000:c3:00.1: Queue 1 can access 7 LSB regions [ 10.059698] ccp 0000:c3:00.1: Queue 2 can access 7 LSB regions [ 10.059700] ccp 0000:c3:00.1: Queue 3 can access 7 LSB regions [ 10.059702] ccp 0000:c3:00.1: Queue 4 can access 7 LSB regions [ 10.059703] ccp 0000:c3:00.1: Queue 0 gets LSB 1 [ 10.059704] ccp 0000:c3:00.1: Queue 1 gets LSB 2 [ 10.059705] ccp 0000:c3:00.1: Queue 2 gets LSB 3 [ 10.059706] ccp 0000:c3:00.1: Queue 3 gets LSB 4 [ 10.059707] ccp 0000:c3:00.1: Queue 4 gets LSB 5 [ 10.060063] ccp 0000:c3:00.1: enabled [ 10.367213] sd 1:0:0:1: Attached scsi generic sg2 type 0 [ 10.367396] scsi 1:0:0:31: Attached scsi generic sg3 type 0 [ 10.367569] sd 1:0:1:0: Attached scsi generic sg4 type 0 [ 10.367719] sd 1:0:1:1: Attached scsi generic sg5 type 0 [ 10.367864] scsi 1:0:1:31: Attached scsi generic sg6 type 0 [ 10.368136] sd 1:0:2:0: Attached scsi generic sg7 type 0 [ 10.368279] sd 1:0:2:1: Attached scsi generic sg8 type 0 [ 10.368482] sd 1:0:2:2: Attached scsi generic sg9 type 0 [ 10.368634] scsi 1:0:2:31: Attached scsi generic sg10 type 0 [ 10.368802] sd 1:0:3:0: Attached scsi generic sg11 type 0 [ 10.368981] sd 1:0:3:1: Attached scsi generic sg12 type 0 [ 10.369280] sd 1:0:3:2: Attached scsi generic sg13 type 0 [ 10.369443] scsi 1:0:3:31: Attached scsi generic sg14 type 0 [ 10.369691] scsi 1:0:4:0: Attached scsi generic sg15 type 13 [ 10.448051] IPMI System Interface driver [ 10.448074] AVX2 version of gcm_enc/dec engaged. [ 10.448076] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS [ 10.448078] AES CTR mode by8 optimization enabled [ 10.448080] ipmi_si: SMBIOS: io 0xca8 regsize 1 spacing 4 irq 10 [ 10.448082] ipmi_si: Adding SMBIOS-specified kcs state machine [ 10.448116] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI [ 10.448151] ipmi_si IPI0001:00: [io 0x0ca8] regsize 1 spacing 4 irq 10 [ 10.448154] ipmi_si dmi-ipmi-si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI [ 10.448155] ipmi_si: Adding ACPI-specified kcs state machine [ 10.448277] ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca8, slave address 0x20, irq 10 [ 10.451803] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni) [ 10.451882] alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni) [ 10.483084] ipmi_si IPI0001:00: The BMC does not support setting the recv irq bit, compensating, but the BMC needs to be fixed. [ 10.490154] ipmi_si IPI0001:00: Using irq 10 [ 10.496111] dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.3) [ 10.512819] ipmi_si IPI0001:00: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20) [ 10.586078] ipmi_si IPI0001:00: IPMI kcs interface initialized [ 10.719759] sd 1:0:0:0: Embedded Enclosure Device [ 10.723071] sd 1:0:0:1: Embedded Enclosure Device [ 10.725151] scsi 1:0:0:31: Embedded Enclosure Device [ 10.727192] sd 1:0:1:0: Embedded Enclosure Device [ 10.729462] sd 1:0:1:1: Embedded Enclosure Device [ 10.731591] scsi 1:0:1:31: Embedded Enclosure Device [ 10.733707] sd 1:0:2:0: Embedded Enclosure Device [ 10.735927] sd 1:0:2:1: Embedded Enclosure Device [ 10.738038] sd 1:0:2:2: Embedded Enclosure Device [ 10.740180] scsi 1:0:2:31: Embedded Enclosure Device [ 10.742275] sd 1:0:3:0: Embedded Enclosure Device [ 10.744540] sd 1:0:3:1: Embedded Enclosure Device [ 10.746683] sd 1:0:3:2: Embedded Enclosure Device [ 10.748834] scsi 1:0:3:31: Embedded Enclosure Device [ 10.751754] ses 1:0:4:0: Attached Enclosure device [ 10.869311] kvm: Nested Paging enabled [ 10.876180] MCE: In-kernel MCE decoding enabled. [ 10.880079] AMD64 EDAC driver v3.4.0 [ 10.880096] EDAC amd64: DRAM ECC enabled. [ 10.880097] EDAC amd64: F17h detected (node 0). [ 10.880141] EDAC MC: UMC0 chip selects: [ 10.880142] EDAC amd64: MC: 0: 0MB 1: 0MB [ 10.880142] EDAC amd64: MC: 2: 16383MB 3: 16383MB [ 10.880143] EDAC amd64: MC: 4: 0MB 5: 0MB [ 10.880144] EDAC amd64: MC: 6: 0MB 7: 0MB [ 10.880146] EDAC MC: UMC1 chip selects: [ 10.880147] EDAC amd64: MC: 0: 0MB 1: 0MB [ 10.880147] EDAC amd64: MC: 2: 16383MB 3: 16383MB [ 10.880148] EDAC amd64: MC: 4: 0MB 5: 0MB [ 10.880148] EDAC amd64: MC: 6: 0MB 7: 0MB [ 10.880149] EDAC amd64: using x8 syndromes. [ 10.880149] EDAC amd64: MCT channel count: 2 [ 10.880310] EDAC MC0: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:18.3 [ 10.880315] EDAC amd64: DRAM ECC enabled. [ 10.880316] EDAC amd64: F17h detected (node 1). [ 10.880354] EDAC MC: UMC0 chip selects: [ 10.880354] EDAC amd64: MC: 0: 0MB 1: 0MB [ 10.880355] EDAC amd64: MC: 2: 16383MB 3: 16383MB [ 10.880356] EDAC amd64: MC: 4: 0MB 5: 0MB [ 10.880356] EDAC amd64: MC: 6: 0MB 7: 0MB [ 10.880358] EDAC MC: UMC1 chip selects: [ 10.880359] EDAC amd64: MC: 0: 0MB 1: 0MB [ 10.880360] EDAC amd64: MC: 2: 16383MB 3: 16383MB [ 10.880360] EDAC amd64: MC: 4: 0MB 5: 0MB [ 10.880361] EDAC amd64: MC: 6: 0MB 7: 0MB [ 10.880361] EDAC amd64: using x8 syndromes. [ 10.880361] EDAC amd64: MCT channel count: 2 [ 10.880505] EDAC MC1: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:19.3 [ 10.880624] EDAC amd64: DRAM ECC enabled. [ 10.880625] EDAC amd64: F17h detected (node 2). [ 10.881399] EDAC MC: UMC0 chip selects: [ 10.881400] EDAC amd64: MC: 0: 0MB 1: 0MB [ 10.881401] EDAC amd64: MC: 2: 16383MB 3: 16383MB [ 10.881401] EDAC amd64: MC: 4: 0MB 5: 0MB [ 10.881402] EDAC amd64: MC: 6: 0MB 7: 0MB [ 10.881460] EDAC MC: UMC1 chip selects: [ 10.881460] EDAC amd64: MC: 0: 0MB 1: 0MB [ 10.881461] EDAC amd64: MC: 2: 16383MB 3: 16383MB [ 10.881462] EDAC amd64: MC: 4: 0MB 5: 0MB [ 10.881462] EDAC amd64: MC: 6: 0MB 7: 0MB [ 10.881463] EDAC amd64: using x8 syndromes. [ 10.881463] EDAC amd64: MCT channel count: 2 [ 10.881722] EDAC MC2: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:1a.3 [ 10.881728] EDAC amd64: DRAM ECC enabled. [ 10.881729] EDAC amd64: F17h detected (node 3). [ 10.881772] EDAC MC: UMC0 chip selects: [ 10.881772] EDAC amd64: MC: 0: 0MB 1: 0MB [ 10.881773] EDAC amd64: MC: 2: 16383MB 3: 16383MB [ 10.881774] EDAC amd64: MC: 4: 0MB 5: 0MB [ 10.881774] EDAC amd64: MC: 6: 0MB 7: 0MB [ 10.881777] EDAC MC: UMC1 chip selects: [ 10.881777] EDAC amd64: MC: 0: 0MB 1: 0MB [ 10.881778] EDAC amd64: MC: 2: 16383MB 3: 16383MB [ 10.881778] EDAC amd64: MC: 4: 0MB 5: 0MB [ 10.881779] EDAC amd64: MC: 6: 0MB 7: 0MB [ 10.881779] EDAC amd64: using x8 syndromes. [ 10.881780] EDAC amd64: MCT channel count: 2 [ 10.882572] EDAC MC3: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:1b.3 [ 10.882642] EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI controller': DEV '0000:00:18.0' (POLLED) [ 35.321573] device-mapper: multipath round-robin: version 1.2.0 loaded [ 49.146413] Adding 4194300k swap on /dev/sda3. Priority:-2 extents:1 across:4194300k FS [ 49.185507] type=1305 audit(1572832423.682:3): audit_pid=11690 old=0 auid=4294967295 ses=4294967295 res=1 [ 49.206756] RPC: Registered named UNIX socket transport module. [ 49.213978] RPC: Registered udp transport module. [ 49.220054] RPC: Registered tcp transport module. [ 49.226145] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 49.885795] mlx5_core 0000:01:00.0: slow_pci_heuristic:5575:(pid 11998): Max link speed = 100000, PCI BW = 126016 [ 49.896117] mlx5_core 0000:01:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) [ 49.904393] mlx5_core 0000:01:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) [ 50.341640] tg3 0000:81:00.0: irq 254 for MSI/MSI-X [ 50.341654] tg3 0000:81:00.0: irq 255 for MSI/MSI-X [ 50.341666] tg3 0000:81:00.0: irq 256 for MSI/MSI-X [ 50.341676] tg3 0000:81:00.0: irq 257 for MSI/MSI-X [ 50.341690] tg3 0000:81:00.0: irq 258 for MSI/MSI-X [ 50.467787] IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready [ 54.040307] tg3 0000:81:00.0 em1: Link is up at 1000 Mbps, full duplex [ 54.046847] tg3 0000:81:00.0 em1: Flow control is on for TX and on for RX [ 54.053639] tg3 0000:81:00.0 em1: EEE is enabled [ 54.058291] IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready [ 54.825434] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready [ 55.111078] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready [ 59.243851] FS-Cache: Loaded [ 59.274341] FS-Cache: Netfs 'nfs' registered for caching [ 59.283675] Key type dns_resolver registered [ 59.312170] NFS: Registering the id_resolver key type [ 59.318380] Key type id_resolver registered [ 59.323957] Key type id_legacy registered [ 5706.018264] LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 [ 5706.025852] alg: No test for adler32 (adler32-zlib) [ 5706.825894] Lustre: Lustre: Build Version: 2.12.3_2_gb033996 [ 5706.933072] LNet: 39500:0:(config.c:1627:lnet_inet_enumerate()) lnet: Ignoring interface em2: it's down [ 5706.942862] LNet: Using FastReg for registration [ 5706.958897] LNet: Added LNI 10.0.10.53@o2ib7 [8/256/0/180] [ 7739.953938] LDISKFS-fs (dm-0): file extents enabled, maximum tree depth=5 [ 7740.040559] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [ 7740.982682] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.109.69@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 7741.000078] LustreError: Skipped 1 previous similar message [ 7741.050076] Lustre: fir-MDT0002: Not available for connect from 10.0.10.54@o2ib7 (not set up) [ 7741.550822] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.26.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 7741.568105] LustreError: Skipped 13 previous similar messages [ 7742.284012] Lustre: fir-MDT0002: Not available for connect from 10.9.101.19@o2ib4 (not set up) [ 7742.292624] Lustre: Skipped 5 previous similar messages [ 7742.575160] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.109.65@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 7742.592543] LustreError: Skipped 35 previous similar messages [ 7743.121956] LustreError: 11-0: fir-MDT0001-osp-MDT0002: operation mds_connect to node 10.0.10.52@o2ib7 failed: rc = -114 [ 7743.323523] Lustre: fir-MDT0002: Imperative Recovery not enabled, recovery window 300-900 [ 7743.354056] Lustre: fir-MDD0002: changelog on [ 7743.361536] Lustre: fir-MDT0002: in recovery but waiting for the first client to connect [ 7743.449810] Lustre: fir-MDT0002: Will be in recovery for at least 5:00, or until 1252 clients reconnect [ 7744.459449] Lustre: fir-MDT0002: Connection restored to 0adbdc36-88db-ed9c-0b60-ad3723a98d21 (at 10.9.115.2@o2ib4) [ 7744.469808] Lustre: Skipped 5 previous similar messages [ 7744.581855] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.8.21@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 7744.599048] LustreError: Skipped 32 previous similar messages [ 7745.055987] Lustre: fir-MDT0002: Connection restored to a237cf27-679e-e69b-02be-ab050bff766b (at 10.9.114.14@o2ib4) [ 7745.066422] Lustre: Skipped 4 previous similar messages [ 7746.079912] Lustre: fir-MDT0002: Connection restored to d3ddcfcc-0a22-2702-0915-85d5cd98973e (at 10.9.102.42@o2ib4) [ 7746.090348] Lustre: Skipped 12 previous similar messages [ 7748.092265] Lustre: fir-MDT0002: Connection restored to 06402f94-dd65-b6c2-7925-b84955058da8 (at 10.8.25.27@o2ib6) [ 7748.102617] Lustre: Skipped 38 previous similar messages [ 7748.800881] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.18.31@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 7748.818159] LustreError: Skipped 40 previous similar messages [ 7752.116989] Lustre: fir-MDT0002: Connection restored to 72d49708-1e0b-4f0b-878b-acdad5cc968c (at 10.9.117.35@o2ib4) [ 7752.127460] Lustre: Skipped 76 previous similar messages [ 7757.039123] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.18.32@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 7757.056404] LustreError: Skipped 55 previous similar messages [ 7760.320786] Lustre: fir-MDT0002: Connection restored to fbefd9c2-b03e-16ab-7b85-ec9f835d33da (at 10.9.105.22@o2ib4) [ 7760.331222] Lustre: Skipped 110 previous similar messages [ 7768.402937] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [ 7768.413807] LustreError: Skipped 193 previous similar messages [ 7773.163090] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.105.31@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 7773.180469] LustreError: Skipped 260 previous similar messages [ 7776.328754] Lustre: fir-MDT0002: Connection restored to 51ad6e41-5953-1177-9c9f-c518933cffa5 (at 10.9.108.8@o2ib4) [ 7776.339105] Lustre: Skipped 274 previous similar messages [ 7793.491622] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [ 7793.502501] LustreError: Skipped 95 previous similar messages [ 7805.174769] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.22.18@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 7805.192053] LustreError: Skipped 338 previous similar messages [ 7808.432507] Lustre: fir-MDT0002: Connection restored to (at 10.9.116.19@o2ib4) [ 7808.439831] Lustre: Skipped 606 previous similar messages [ 7818.580150] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [ 7818.591028] LustreError: Skipped 95 previous similar messages [ 7843.668844] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [ 7843.679715] LustreError: Skipped 95 previous similar messages [ 7865.379956] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnected, waiting for 1252 clients in recovery for 2:58 [ 7868.758302] LustreError: 11-0: fir-OST0002-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [ 7868.769174] LustreError: Skipped 95 previous similar messages [ 7869.516210] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnected, waiting for 1252 clients in recovery for 2:53 [ 7873.275684] Lustre: fir-MDT0002: Connection restored to (at 10.9.105.33@o2ib4) [ 7873.282999] Lustre: Skipped 202 previous similar messages [ 7893.846198] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [ 7893.857100] LustreError: Skipped 95 previous similar messages [ 7944.023670] LustreError: 11-0: fir-OST0001-osc-MDT0002: operation ost_connect to node 10.0.10.102@o2ib7 failed: rc = -16 [ 7944.034560] LustreError: Skipped 191 previous similar messages [ 7981.182244] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnected, waiting for 1252 clients in recovery for 1:02 [ 7985.440165] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnected, waiting for 1252 clients in recovery for 0:58 [ 8019.289515] LustreError: 11-0: fir-OST0001-osc-MDT0002: operation ost_connect to node 10.0.10.102@o2ib7 failed: rc = -16 [ 8019.300383] LustreError: Skipped 287 previous similar messages [ 8041.482684] Lustre: fir-MDT0003-osp-MDT0002: Connection restored to 10.0.10.54@o2ib7 (at 10.0.10.54@o2ib7) [ 8041.492345] Lustre: Skipped 10 previous similar messages [ 8043.457920] Lustre: fir-MDT0002: recovery is timed out, evict stale exports [ 8043.465180] Lustre: fir-MDT0002: disconnecting 2 stale clients [ 8081.536791] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnected, waiting for 1252 clients in recovery for 1:42 [ 8169.821740] LustreError: 11-0: fir-OST0003-osc-MDT0002: operation ost_connect to node 10.0.10.102@o2ib7 failed: rc = -16 [ 8169.832616] LustreError: Skipped 245 previous similar messages [ 8181.891425] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnected, waiting for 1252 clients in recovery for 0:01 [ 8181.905590] Lustre: Skipped 1 previous similar message [ 8186.149498] Lustre: fir-MDT0002: Recovery already passed deadline 0:02. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 8282.246353] Lustre: fir-MDT0002: Recovery already passed deadline 1:38. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 8286.504232] Lustre: fir-MDT0002: Recovery already passed deadline 1:42. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 8407.689939] Lustre: fir-MDT0002: Recovery already passed deadline 3:44. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 8407.706040] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [ 8407.716471] Lustre: Skipped 8 previous similar messages [ 8411.947560] Lustre: fir-MDT0002: Recovery already passed deadline 3:48. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 8445.797153] LustreError: 11-0: fir-OST0007-osc-MDT0002: operation ost_connect to node 10.0.10.102@o2ib7 failed: rc = -16 [ 8445.808035] LustreError: Skipped 329 previous similar messages [ 8533.132763] Lustre: fir-MDT0002: Recovery already passed deadline 5:49. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 8658.576337] Lustre: fir-MDT0002: Recovery already passed deadline 7:54. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 8658.592405] Lustre: Skipped 1 previous similar message [ 8784.019577] Lustre: fir-MDT0002: Recovery already passed deadline 10:00. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 8784.035734] Lustre: Skipped 1 previous similar message [ 8909.462891] Lustre: fir-MDT0002: Recovery already passed deadline 12:05. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 8909.479047] Lustre: Skipped 1 previous similar message [ 8922.247774] Lustre: fir-MDT0002: Denying connection for new client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6), waiting for 1252 known clients (995 recovered, 255 in progress, and 2 evicted) already passed deadline 12:18 [ 8923.261400] Lustre: fir-MDT0002: Denying connection for new client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6), waiting for 1252 known clients (995 recovered, 255 in progress, and 2 evicted) already passed deadline 12:19 [ 8948.927610] Lustre: fir-MDT0002: Denying connection for new client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6), waiting for 1252 known clients (995 recovered, 255 in progress, and 2 evicted) already passed deadline 12:45 [ 8972.659441] LustreError: 11-0: fir-OST0007-osc-MDT0002: operation ost_connect to node 10.0.10.102@o2ib7 failed: rc = -16 [ 8972.670327] LustreError: Skipped 629 previous similar messages [ 8974.016130] Lustre: fir-MDT0002: Denying connection for new client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6), waiting for 1252 known clients (995 recovered, 255 in progress, and 2 evicted) already passed deadline 13:10 [ 8999.104663] Lustre: fir-MDT0002: Denying connection for new client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6), waiting for 1252 known clients (995 recovered, 255 in progress, and 2 evicted) already passed deadline 13:35 [ 9024.193339] Lustre: fir-MDT0002: Denying connection for new client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6), waiting for 1252 known clients (995 recovered, 255 in progress, and 2 evicted) already passed deadline 14:00 [ 9049.281958] Lustre: fir-MDT0002: Denying connection for new client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6), waiting for 1252 known clients (995 recovered, 255 in progress, and 2 evicted) already passed deadline 14:25 [ 9059.995020] Lustre: fir-MDT0002: Recovery already passed deadline 14:36. If you do not want to wait more, you may force taget eviction via 'lctl --device fir-MDT0002 abort_recovery. [ 9060.011195] Lustre: Skipped 1 previous similar message [ 9060.016361] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [ 9060.026816] Lustre: Skipped 9 previous similar messages [ 9099.459291] Lustre: fir-MDT0002: Denying connection for new client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6), waiting for 1252 known clients (995 recovered, 255 in progress, and 2 evicted) already passed deadline 15:15 [ 9099.480045] Lustre: Skipped 1 previous similar message [ 9174.725341] Lustre: fir-MDT0002: Denying connection for new client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6), waiting for 1252 known clients (995 recovered, 255 in progress, and 2 evicted) already passed deadline 16:31 [ 9174.746092] Lustre: Skipped 2 previous similar messages [ 9181.691533] LustreError: 43073:0:(mdt_handler.c:6687:mdt_iocontrol()) fir-MDT0002: Aborting recovery for device [ 9181.701624] LustreError: 43073:0:(ldlm_lib.c:2605:target_stop_recovery_thread()) fir-MDT0002: Aborting recovery [ 9181.711720] Lustre: 40781:0:(ldlm_lib.c:2056:target_recovery_overseer()) recovery is aborted, evict exports in recovery [ 9181.722818] Lustre: fir-MDT0002: disconnecting 255 stale clients [ 9181.759981] LustreError: 40781:0:(ldlm_lib.c:1634:abort_lock_replay_queue()) @@@ aborted: req@ffff9a619d667980 x1648768283432576/t0(0) o101->94e48e69-f86a-1787-8aae-b919291854b5@10.8.18.33@o2ib6:658/0 lens 328/0 e 49 to 0 dl 1572841573 ref 1 fl Complete:/40/ffffffff rc 0/-1 [ 9181.793803] Lustre: 40781:0:(ldlm_lib.c:2046:target_recovery_overseer()) fir-MDT0002 recovery is aborted by hard timeout [ 9181.838829] Lustre: 40781:0:(ldlm_lib.c:2550:target_recovery_thread()) too long recovery - read logs [ 9181.848135] Lustre: fir-MDT0002: Recovery over after 23:59, of 1252 clients 995 recovered and 257 were evicted. [ 9181.848138] LustreError: dumping log to /tmp/lustre-log.1572841556.40781 [ 9361.058960] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [ 9365.317132] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [ 9408.727215] Lustre: fir-MDT0002: haven't heard from client 3a3631d2-ae54-0870-4eb2-d0897b427eea (at 10.8.28.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f2a32cc00, cur 1572841783 expire 1572841633 last 1572841556 [ 9662.123369] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [ 9662.133565] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [ 9662.144000] Lustre: Skipped 292 previous similar messages [ 9666.381042] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [ 9676.719621] Lustre: fir-MDT0002: haven't heard from client 9de16268-0600-0012-e5c1-5468b74b877d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a81a7c55800, cur 1572842051 expire 1572841901 last 1572841824 [ 9676.741238] Lustre: Skipped 31 previous similar messages [ 9862.832553] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [ 9963.187118] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [ 9963.197292] Lustre: Skipped 1 previous similar message [ 9988.731320] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90def9a800, cur 1572842363 expire 1572842213 last 1572842136 [ 9988.753116] Lustre: Skipped 2 previous similar messages [10214.073997] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [10214.084171] Lustre: Skipped 1 previous similar message [10314.738171] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a709f39a000, cur 1572842689 expire 1572842539 last 1572842462 [10314.759959] Lustre: Skipped 1 previous similar message [10339.518062] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [10339.528497] Lustre: Skipped 11 previous similar messages [10464.961485] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [10464.971663] Lustre: Skipped 1 previous similar message [10565.744650] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b4c69c00, cur 1572842940 expire 1572842790 last 1572842713 [10565.766441] Lustre: Skipped 1 previous similar message [10740.936867] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [10740.947044] Lustre: Skipped 1 previous similar message [10816.754389] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6162bfd800, cur 1572843191 expire 1572843041 last 1572842964 [10816.776204] Lustre: Skipped 1 previous similar message [11042.000111] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [11042.010291] Lustre: Skipped 1 previous similar message [11042.015456] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [11042.025909] Lustre: Skipped 9 previous similar messages [11117.759329] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90e4249000, cur 1572843492 expire 1572843342 last 1572843265 [11117.781122] Lustre: Skipped 1 previous similar message [11343.063982] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [11343.074200] Lustre: Skipped 1 previous similar message [11418.765904] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a749c1c9c00, cur 1572843793 expire 1572843643 last 1572843566 [11418.787695] Lustre: Skipped 1 previous similar message [11644.128084] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [11644.138544] Lustre: Skipped 11 previous similar messages [11669.774609] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a60fb414000, cur 1572844044 expire 1572843894 last 1572843817 [11669.796412] Lustre: Skipped 1 previous similar message [11895.015743] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [11895.025919] Lustre: Skipped 5 previous similar messages [11995.786338] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6dc066dc00, cur 1572844370 expire 1572844220 last 1572844143 [11995.808128] Lustre: Skipped 1 previous similar message [12271.345161] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [12271.355600] Lustre: Skipped 9 previous similar messages [12497.798273] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a70f12ea400, cur 1572844872 expire 1572844722 last 1572844645 [12497.820070] Lustre: Skipped 3 previous similar messages [12558.344339] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572844925/real 1572844925] req@ffff9a6037abe780 x1649240337962240/t0(0) o106->fir-MDT0002@10.8.27.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1572844932 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [12565.371524] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572844932/real 1572844932] req@ffff9a6037abe780 x1649240337962240/t0(0) o106->fir-MDT0002@10.8.27.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1572844939 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [12572.398718] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572844939/real 1572844939] req@ffff9a6037abe780 x1649240337962240/t0(0) o106->fir-MDT0002@10.8.27.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1572844946 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [12579.425907] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572844946/real 1572844946] req@ffff9a6037abe780 x1649240337962240/t0(0) o106->fir-MDT0002@10.8.27.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1572844953 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [12586.454085] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572844953/real 1572844953] req@ffff9a6037abe780 x1649240337962240/t0(0) o106->fir-MDT0002@10.8.27.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1572844960 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [12600.482462] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572844967/real 1572844967] req@ffff9a6037abe780 x1649240337962240/t0(0) o106->fir-MDT0002@10.8.27.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1572844974 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [12600.509643] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [12621.519034] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572844988/real 1572844988] req@ffff9a6037abe780 x1649240337962240/t0(0) o106->fir-MDT0002@10.8.27.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1572844995 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [12621.546210] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [12656.555976] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572845023/real 1572845023] req@ffff9a6037abe780 x1649240337962240/t0(0) o106->fir-MDT0002@10.8.27.3@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1572845030 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [12656.583149] Lustre: 43306:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [12671.806805] LustreError: 43306:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.27.3@o2ib6) failed to reply to glimpse AST (req@ffff9a6037abe780 x1649240337962240 status 0 rc -5), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6d3f3845c0/0x3428b9d228bca07c lrc: 4/0,0 mode: PW/PW res: [0x2c00321f6:0xc3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.3@o2ib6 remote: 0x6c78687b73cbb839 expref: 15 pid: 43210 timeout: 0 lvb_type: 0 [12671.848467] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.3@o2ib6 was evicted due to a lock glimpse callback time out: rc -5 [12722.941578] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [12722.951750] Lustre: Skipped 5 previous similar messages [12873.473649] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [12873.484080] Lustre: Skipped 7 previous similar messages [13099.811172] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90f5635000, cur 1572845474 expire 1572845324 last 1572845247 [13099.832987] Lustre: Skipped 4 previous similar messages [13325.069390] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [13325.079566] Lustre: Skipped 5 previous similar messages [13575.956985] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [13575.967421] Lustre: Skipped 11 previous similar messages [13927.834490] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a70802bc000, cur 1572846302 expire 1572846152 last 1572846075 [13927.856285] Lustre: Skipped 5 previous similar messages [14102.818672] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [14102.828847] Lustre: Skipped 5 previous similar messages [14253.350938] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [14253.361371] Lustre: Skipped 9 previous similar messages [14704.947628] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [14704.957808] Lustre: Skipped 3 previous similar messages [14780.857866] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6029450000, cur 1572847155 expire 1572847005 last 1572846928 [14780.879734] Lustre: Skipped 5 previous similar messages [14855.479532] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [14855.489969] Lustre: Skipped 8 previous similar messages [15512.042266] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [15512.052444] Lustre: Skipped 7 previous similar messages [15512.057697] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [15512.068146] Lustre: Skipped 11 previous similar messages [15569.886371] Lustre: fir-MDT0002: haven't heard from client 7a5271e0-de70-a523-5bcd-71c5bfaf3872 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7569b8c400, cur 1572847944 expire 1572847794 last 1572847717 [15569.907995] Lustre: Skipped 5 previous similar messages [16235.355082] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [16235.365514] Lustre: Skipped 9 previous similar messages [16385.887158] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [16385.897336] Lustre: Skipped 5 previous similar messages [16461.904410] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61be4c2000, cur 1572848836 expire 1572848686 last 1572848609 [16461.926200] Lustre: Skipped 6 previous similar messages [16837.482793] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [16837.493230] Lustre: Skipped 9 previous similar messages [17063.916352] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f28b79400, cur 1572849438 expire 1572849288 last 1572849211 [17063.938159] Lustre: Skipped 3 previous similar messages [17213.812379] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [17213.822552] Lustre: Skipped 7 previous similar messages [17464.699634] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [17464.710069] Lustre: Skipped 9 previous similar messages [17841.941649] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a600b1bac00, cur 1572850216 expire 1572850066 last 1572849989 [17841.963462] Lustre: Skipped 5 previous similar messages [18066.826290] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [18066.836464] Lustre: Skipped 5 previous similar messages [18066.841731] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [18066.852185] Lustre: Skipped 7 previous similar messages [18443.953190] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a51e1a00c00, cur 1572850818 expire 1572850668 last 1572850591 [18443.974986] Lustre: Skipped 3 previous similar messages [18769.308516] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [18769.318954] Lustre: Skipped 11 previous similar messages [18894.751552] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [18894.761727] Lustre: Skipped 7 previous similar messages [19246.973504] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6fd16b6000, cur 1572851621 expire 1572851471 last 1572851394 [19246.995298] Lustre: Skipped 5 previous similar messages [19446.702589] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [19446.713028] Lustre: Skipped 9 previous similar messages [19747.765729] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [19747.775907] Lustre: Skipped 5 previous similar messages [20099.007057] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [20099.017493] Lustre: Skipped 10 previous similar messages [20124.996826] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7a5d246400, cur 1572852499 expire 1572852349 last 1572852272 [20125.018617] Lustre: Skipped 6 previous similar messages [20575.691892] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [20575.702069] Lustre: Skipped 7 previous similar messages [20701.135403] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [20701.145838] Lustre: Skipped 9 previous similar messages [20928.020641] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6e85a20c00, cur 1572853302 expire 1572853152 last 1572853075 [20928.042435] Lustre: Skipped 5 previous similar messages [21300.156462] Lustre: 43242:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572853667/real 1572853667] req@ffff9a61b5738900 x1649240601224352/t0(0) o106->fir-MDT0002@10.9.107.9@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1572853674 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [21300.183729] Lustre: 43242:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [21314.193874] Lustre: 43242:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572853681/real 1572853681] req@ffff9a61b5738900 x1649240601224352/t0(0) o106->fir-MDT0002@10.9.107.9@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1572853688 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [21314.221139] Lustre: 43242:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [21335.230469] Lustre: 43242:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572853702/real 1572853702] req@ffff9a61b5738900 x1649240601224352/t0(0) o106->fir-MDT0002@10.9.107.9@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1572853709 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [21335.257719] Lustre: 43242:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [21365.034931] LustreError: 43242:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.107.9@o2ib4) failed to reply to glimpse AST (req@ffff9a61b5738900 x1649240601224352 status 0 rc -5), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6fea774c80/0x3428b9d2449f148c lrc: 4/0,0 mode: PW/PW res: [0x2c00321c2:0xcb:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.9.107.9@o2ib4 remote: 0xc3d25e45c455188c expref: 20 pid: 43210 timeout: 0 lvb_type: 0 [21365.076759] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.107.9@o2ib4 was evicted due to a lock glimpse callback time out: rc -5 [21428.706294] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [21428.716468] Lustre: Skipped 5 previous similar messages [21428.721722] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [21428.732172] Lustre: Skipped 9 previous similar messages [21806.045222] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6d0de6f000, cur 1572854180 expire 1572854030 last 1572853953 [21806.067015] Lustre: Skipped 6 previous similar messages [22131.189034] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [22131.199472] Lustre: Skipped 11 previous similar messages [22256.631791] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [22256.641966] Lustre: Skipped 7 previous similar messages [22609.061612] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8fa7a87400, cur 1572854983 expire 1572854833 last 1572854756 [22609.083404] Lustre: Skipped 5 previous similar messages [22808.582208] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [22808.592643] Lustre: Skipped 9 previous similar messages [23109.646186] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [23109.656366] Lustre: Skipped 5 previous similar messages [23410.710336] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [23410.720772] Lustre: Skipped 8 previous similar messages [23487.089797] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90fe157800, cur 1572855861 expire 1572855711 last 1572855634 [23487.111590] Lustre: Skipped 5 previous similar messages [23711.774381] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [23711.784579] Lustre: Skipped 6 previous similar messages [24067.273276] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [24067.283720] Lustre: Skipped 10 previous similar messages [24294.112702] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a60311a0400, cur 1572856668 expire 1572856518 last 1572856441 [24294.134532] Lustre: Skipped 5 previous similar messages [24489.522614] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [24489.532786] Lustre: Skipped 4 previous similar messages [24790.586234] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [24790.596667] Lustre: Skipped 9 previous similar messages [25091.650254] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [25091.660427] Lustre: Skipped 3 previous similar messages [25167.155307] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a787274fc00, cur 1572857541 expire 1572857391 last 1572857314 [25167.177096] Lustre: Skipped 6 previous similar messages [25392.714092] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [25392.724527] Lustre: Skipped 9 previous similar messages [25894.487475] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [25894.497646] Lustre: Skipped 7 previous similar messages [25995.168651] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a782d9e3c00, cur 1572858369 expire 1572858219 last 1572858142 [25995.190443] Lustre: Skipped 5 previous similar messages [26019.931462] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [26019.941897] Lustre: Skipped 9 previous similar messages [26622.058758] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [26622.069198] Lustre: Skipped 7 previous similar messages [26772.590678] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [26772.600855] Lustre: Skipped 5 previous similar messages [26848.175318] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a708da32c00, cur 1572859222 expire 1572859072 last 1572858995 [26848.197129] Lustre: Skipped 5 previous similar messages [27324.541171] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [27324.551607] Lustre: Skipped 11 previous similar messages [27575.427915] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [27575.438090] Lustre: Skipped 7 previous similar messages [27676.230264] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5fa62b8000, cur 1572860050 expire 1572859900 last 1572859823 [27676.252049] Lustre: Skipped 5 previous similar messages [28001.935333] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [28001.945772] Lustre: Skipped 9 previous similar messages [28453.531123] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [28453.541301] Lustre: Skipped 5 previous similar messages [28529.237298] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8facf96800, cur 1572860903 expire 1572860753 last 1572860676 [28529.259090] Lustre: Skipped 5 previous similar messages [28654.240291] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [28654.250733] Lustre: Skipped 9 previous similar messages [29256.368903] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [29256.379080] Lustre: Skipped 7 previous similar messages [29256.384332] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [29256.394784] Lustre: Skipped 9 previous similar messages [29357.243109] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7bfbd6a000, cur 1572861731 expire 1572861581 last 1572861504 [29357.264907] Lustre: Skipped 5 previous similar messages [29983.939860] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [29983.950298] Lustre: Skipped 9 previous similar messages [30134.471271] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [30134.481446] Lustre: Skipped 5 previous similar messages [30210.264918] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90fdf12c00, cur 1572862584 expire 1572862434 last 1572862357 [30210.286737] Lustre: Skipped 5 previous similar messages [30686.421965] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [30686.432400] Lustre: Skipped 11 previous similar messages [30937.308805] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [30937.318984] Lustre: Skipped 7 previous similar messages [31038.286229] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6dd8a62000, cur 1572863412 expire 1572863262 last 1572863185 [31038.308065] Lustre: Skipped 5 previous similar messages [31363.815722] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [31363.826151] Lustre: Skipped 9 previous similar messages [31815.411508] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [31815.421716] Lustre: Skipped 5 previous similar messages [31891.308567] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a739598c800, cur 1572864265 expire 1572864115 last 1572864038 [31891.330359] Lustre: Skipped 5 previous similar messages [31965.944019] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [31965.954457] Lustre: Skipped 8 previous similar messages [32622.506971] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [32622.517145] Lustre: Skipped 7 previous similar messages [32622.522402] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [32622.532850] Lustre: Skipped 10 previous similar messages [32723.363004] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a81ad280c00, cur 1572865097 expire 1572864947 last 1572864870 [32723.384888] Lustre: Skipped 5 previous similar messages [33345.820251] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [33345.830689] Lustre: Skipped 9 previous similar messages [33496.352605] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [33496.362779] Lustre: Skipped 5 previous similar messages [33572.349898] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7b0df8a800, cur 1572865946 expire 1572865796 last 1572865719 [33572.371689] Lustre: Skipped 6 previous similar messages [33947.948492] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [33947.958926] Lustre: Skipped 9 previous similar messages [34174.377240] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7262521c00, cur 1572866548 expire 1572866398 last 1572866321 [34174.399033] Lustre: Skipped 3 previous similar messages [34324.278364] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [34324.288545] Lustre: Skipped 7 previous similar messages [34575.164837] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [34575.175274] Lustre: Skipped 9 previous similar messages [34952.394369] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a78dc4f7800, cur 1572867326 expire 1572867176 last 1572867099 [34952.416164] Lustre: Skipped 5 previous similar messages [35177.292913] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [35177.303088] Lustre: Skipped 5 previous similar messages [35177.308344] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [35177.318797] Lustre: Skipped 8 previous similar messages [35554.414622] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a9087f2dc00, cur 1572867928 expire 1572867778 last 1572867701 [35554.436412] Lustre: Skipped 3 previous similar messages [35879.775746] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [35879.786182] Lustre: Skipped 11 previous similar messages [36005.218922] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [36005.229096] Lustre: Skipped 7 previous similar messages [36357.441537] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6000ae9c00, cur 1572868731 expire 1572868581 last 1572868504 [36357.463349] Lustre: Skipped 5 previous similar messages [36557.170271] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [36557.180704] Lustre: Skipped 9 previous similar messages [36858.234025] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [36858.244202] Lustre: Skipped 5 previous similar messages [37209.475088] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [37209.485530] Lustre: Skipped 9 previous similar messages [37235.463094] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91431d7400, cur 1572869609 expire 1572869459 last 1572869382 [37235.484883] Lustre: Skipped 6 previous similar messages [37686.159313] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [37686.169486] Lustre: Skipped 7 previous similar messages [37811.602354] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [37811.612833] Lustre: Skipped 9 previous similar messages [38038.470460] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7893643000, cur 1572870412 expire 1572870262 last 1572870185 [38038.492263] Lustre: Skipped 5 previous similar messages [38539.172630] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [38539.182807] Lustre: Skipped 5 previous similar messages [38539.188058] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [38539.198495] Lustre: Skipped 11 previous similar messages [38916.493093] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a616b2a1800, cur 1572871290 expire 1572871140 last 1572871063 [38916.514884] Lustre: Skipped 6 previous similar messages [38975.589315] LNetError: 39559:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [39077.486748] perf: interrupt took too long (2582 > 2500), lowering kernel.perf_event_max_sample_rate to 77000 [39241.654503] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [39241.664935] Lustre: Skipped 11 previous similar messages [39367.097919] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [39367.108099] Lustre: Skipped 7 previous similar messages [39719.510837] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90f9ee0c00, cur 1572872093 expire 1572871943 last 1572871866 [39719.532654] Lustre: Skipped 6 previous similar messages [39762.310929] LNetError: 39559:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [39919.047885] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [39919.058330] Lustre: Skipped 9 previous similar messages [40220.111500] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [40220.121683] Lustre: Skipped 5 previous similar messages [40521.175397] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [40521.185836] Lustre: Skipped 8 previous similar messages [40597.534226] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5d468d7400, cur 1572872971 expire 1572872821 last 1572872744 [40597.556036] Lustre: Skipped 5 previous similar messages [40822.238945] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [40822.249122] Lustre: Skipped 6 previous similar messages [41177.738389] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [41177.748847] Lustre: Skipped 11 previous similar messages [41269.552618] Lustre: fir-MDT0002: haven't heard from client e9939ec5-4701-835c-b688-b9f296cafced (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7d357a2400, cur 1572873643 expire 1572873493 last 1572873416 [41269.574251] Lustre: Skipped 5 previous similar messages [41599.987297] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [41599.997476] Lustre: Skipped 4 previous similar messages [41901.051343] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [41901.061786] Lustre: Skipped 10 previous similar messages [41976.570005] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5fbf99e400, cur 1572874350 expire 1572874200 last 1572874123 [41976.591881] Lustre: Skipped 5 previous similar messages [42202.115218] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [42202.125391] Lustre: Skipped 4 previous similar messages [42503.179096] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [42503.189559] Lustre: Skipped 10 previous similar messages [42798.934912] LustreError: 40361:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.116.10@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a5e90a03600/0x3428b9d2a159fd27 lrc: 3/0,0 mode: CR/CR res: [0x2c0032d34:0x101:0x0].0x0 bits 0x9/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.116.10@o2ib4 remote: 0x7ae7df7053bbcddf expref: 85 pid: 43442 timeout: 42797 lvb_type: 0 [42798.972727] LustreError: 42021:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffff9a71b6be0400 ns: mdt-fir-MDT0002_UUID lock: ffff9a7bfcb3a880/0x3428b9d2a20c2426 lrc: 3/0,0 mode: EX/EX res: [0x2c0032d34:0x102:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x50000000000000 nid: 10.9.116.10@o2ib4 remote: 0x7ae7df7053bbd20e expref: 55 pid: 42021 timeout: 0 lvb_type: 3 [42834.593018] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8e6f423400, cur 1572875208 expire 1572875058 last 1572874981 [42834.614831] Lustre: Skipped 5 previous similar messages [42984.121656] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [42984.131824] Lustre: Skipped 6 previous similar messages [43130.395516] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [43130.405958] Lustre: Skipped 9 previous similar messages [43274.504483] perf: interrupt took too long (3248 > 3227), lowering kernel.perf_event_max_sample_rate to 61000 [43586.249458] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [43586.259642] Lustre: Skipped 4 previous similar messages [43657.614234] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6022b79800, cur 1572876031 expire 1572875881 last 1572875804 [43657.636050] Lustre: Skipped 5 previous similar messages [43732.523621] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [43732.534125] Lustre: Skipped 7 previous similar messages [44414.175204] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [44414.185383] Lustre: Skipped 6 previous similar messages [44414.190711] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [44414.201183] Lustre: Skipped 11 previous similar messages [44515.636702] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6ffc471000, cur 1572876889 expire 1572876739 last 1572876662 [44515.658577] Lustre: Skipped 5 previous similar messages [44656.050183] LNetError: 39559:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [45112.399324] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [45112.409814] Lustre: Skipped 10 previous similar messages [45136.652254] Lustre: fir-MDT0002: haven't heard from client 2d3dc098-a3a6-505c-2e74-050299bda746 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f25838000, cur 1572877510 expire 1572877360 last 1572877283 [45136.673902] Lustre: Skipped 5 previous similar messages [45262.931311] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [45262.941491] Lustre: Skipped 5 previous similar messages [45718.785411] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [45718.795847] Lustre: Skipped 9 previous similar messages [45865.059419] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [45865.069593] Lustre: Skipped 6 previous similar messages [45890.672658] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6ff9540c00, cur 1572878264 expire 1572878114 last 1572878037 [45890.694453] Lustre: Skipped 4 previous similar messages [46346.002183] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [46346.012617] Lustre: Skipped 9 previous similar messages [46642.808734] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [46642.818908] Lustre: Skipped 4 previous similar messages [46718.697386] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6c04204000, cur 1572879092 expire 1572878942 last 1572878865 [46718.719178] Lustre: Skipped 5 previous similar messages [46948.130197] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [46948.140636] Lustre: Skipped 8 previous similar messages [47244.936787] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [47244.946963] Lustre: Skipped 4 previous similar messages [47320.719201] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6041b80000, cur 1572879694 expire 1572879544 last 1572879467 [47320.740988] Lustre: Skipped 3 previous similar messages [47650.612504] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [47650.622942] Lustre: Skipped 11 previous similar messages [47922.329853] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [47922.340023] Lustre: Skipped 5 previous similar messages [47990.727590] Lustre: fir-MDT0002: haven't heard from client 04e02496-f20c-90ce-9222-372af38bc441 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7c8cc2f400, cur 1572880364 expire 1572880214 last 1572880137 [47990.749226] Lustre: Skipped 6 previous similar messages [48323.748502] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [48323.758940] Lustre: Skipped 10 previous similar messages [48629.069900] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [48629.080079] Lustre: Skipped 4 previous similar messages [48704.745892] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90eb7b5c00, cur 1572881078 expire 1572880928 last 1572880851 [48704.767704] Lustre: Skipped 5 previous similar messages [48925.875939] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [48925.886383] Lustre: Skipped 8 previous similar messages [49277.116557] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [49277.126731] Lustre: Skipped 5 previous similar messages [49306.763487] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5c896a9000, cur 1572881680 expire 1572881530 last 1572881453 [49306.785281] Lustre: Skipped 4 previous similar messages [49582.438453] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [49582.448884] Lustre: Skipped 10 previous similar messages [49879.244636] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [49879.254812] Lustre: Skipped 4 previous similar messages [49980.779876] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7a766ff000, cur 1572882354 expire 1572882204 last 1572882127 [49980.801682] Lustre: Skipped 4 previous similar messages [50305.751545] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [50305.761988] Lustre: Skipped 9 previous similar messages [50560.895783] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [50560.905959] Lustre: Skipped 4 previous similar messages [50642.797282] Lustre: fir-MDT0002: haven't heard from client 5cd4d900-c3f5-563e-9d0d-60c2c7baeee9 (at 10.9.110.67@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a81a2c20400, cur 1572883016 expire 1572882866 last 1572882789 [50642.819090] Lustre: Skipped 4 previous similar messages [50958.056179] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [50958.066616] Lustre: Skipped 10 previous similar messages [51263.377858] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [51263.388039] Lustre: Skipped 5 previous similar messages [51364.814604] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f867c9400, cur 1572883738 expire 1572883588 last 1572883511 [51364.836392] Lustre: Skipped 24 previous similar messages [51560.183813] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [51560.194252] Lustre: Skipped 9 previous similar messages [52137.222693] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [52137.232865] Lustre: Skipped 5 previous similar messages [52172.663128] Lustre: fir-MDT0002: Connection restored to e05cc48d-6b25-1271-2d5a-f23d00ab4bcf (at 10.8.24.2@o2ib6) [52172.673387] Lustre: Skipped 39 previous similar messages [52212.836707] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f29479400, cur 1572884586 expire 1572884436 last 1572884359 [52212.858533] Lustre: Skipped 5 previous similar messages [52739.350310] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [52739.360485] Lustre: Skipped 6 previous similar messages [52818.874445] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [52818.884883] Lustre: Skipped 39 previous similar messages [53045.859949] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a9105df4c00, cur 1572885419 expire 1572885269 last 1572885192 [53045.881736] Lustre: Skipped 5 previous similar messages [53517.098871] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [53517.109047] Lustre: Skipped 4 previous similar messages [53517.114297] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [53517.124732] Lustre: Skipped 11 previous similar messages [53893.882999] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6bd67cc000, cur 1572886267 expire 1572886117 last 1572886040 [53893.904809] Lustre: Skipped 6 previous similar messages [54119.225838] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [54119.236016] Lustre: Skipped 4 previous similar messages [54119.241273] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [54119.251725] Lustre: Skipped 11 previous similar messages [54725.903634] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6fc619f000, cur 1572887099 expire 1572886949 last 1572886872 [54725.925441] Lustre: Skipped 6 previous similar messages [54750.699485] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [54750.709920] Lustre: Skipped 11 previous similar messages [54901.231200] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [54901.241382] Lustre: Skipped 6 previous similar messages [55352.826383] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [55352.836821] Lustre: Skipped 9 previous similar messages [55503.358103] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [55503.368273] Lustre: Skipped 4 previous similar messages [55574.942200] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a51e8a33400, cur 1572887948 expire 1572887798 last 1572887721 [55574.963992] Lustre: Skipped 6 previous similar messages [55886.274219] LustreError: 40361:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.103.53@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a5eea6057c0/0x3428b9d2e97b844b lrc: 3/0,0 mode: PW/PW res: [0x2c0032e40:0x1:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.9.103.53@o2ib4 remote: 0x51ab3c4efd129db expref: 24 pid: 43285 timeout: 55884 lvb_type: 0 [56000.872794] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [56000.883257] Lustre: Skipped 10 previous similar messages [56306.194651] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [56306.204826] Lustre: Skipped 6 previous similar messages [56406.949827] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a608634b800, cur 1572888780 expire 1572888630 last 1572888553 [56406.971611] Lustre: Skipped 6 previous similar messages [56603.001017] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [56603.011451] Lustre: Skipped 9 previous similar messages [57180.039791] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [57180.049963] Lustre: Skipped 5 previous similar messages [57255.970667] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6154d62400, cur 1572889629 expire 1572889479 last 1572889402 [57255.992455] Lustre: Skipped 14 previous similar messages [57330.571684] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [57330.582116] Lustre: Skipped 8 previous similar messages [57782.168309] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [57782.178502] Lustre: Skipped 6 previous similar messages [57933.987823] Lustre: fir-MDT0002: Connection restored to c26cf02d-64c5-f21f-4c20-7ceab0976d33 (at 10.9.109.63@o2ib4) [57933.998265] Lustre: Skipped 14 previous similar messages [58112.992030] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a9110e32800, cur 1572890486 expire 1572890336 last 1572890259 [58113.013837] Lustre: Skipped 5 previous similar messages [58559.916830] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [58559.927005] Lustre: Skipped 4 previous similar messages [58559.932257] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [58559.942707] Lustre: Skipped 17 previous similar messages [58937.012491] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a616bb9cc00, cur 1572891310 expire 1572891160 last 1572891083 [58937.034281] Lustre: Skipped 5 previous similar messages [59162.045291] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [59162.055487] Lustre: Skipped 4 previous similar messages [59162.060741] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [59162.071175] Lustre: Skipped 8 previous similar messages [59794.045218] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7023d1bc00, cur 1572892167 expire 1572892017 last 1572891940 [59794.067008] Lustre: Skipped 5 previous similar messages [59818.607885] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [59818.618321] Lustre: Skipped 10 previous similar messages [59944.051097] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [59944.061278] Lustre: Skipped 6 previous similar messages [60541.920511] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [60541.930947] Lustre: Skipped 9 previous similar messages [60546.178539] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [60546.188711] Lustre: Skipped 4 previous similar messages [60618.055415] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a805072d400, cur 1572892991 expire 1572892841 last 1572892764 [60618.077204] Lustre: Skipped 5 previous similar messages [61136.666047] Lustre: 43170:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [61137.165572] Lustre: 43420:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [61137.177228] Lustre: 43420:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 1274 previous similar messages [61138.165436] Lustre: 43170:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [61138.177083] Lustre: 43170:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 2355 previous similar messages [61140.165786] Lustre: 43232:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [61140.177457] Lustre: 43232:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 4926 previous similar messages [61144.048328] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [61144.058770] Lustre: Skipped 10 previous similar messages [61144.166014] Lustre: 43232:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [61144.177668] Lustre: 43232:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 8443 previous similar messages [61152.165868] Lustre: 43432:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [61152.177523] Lustre: 43432:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 3396 previous similar messages [61168.965316] Lustre: 43164:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [61168.976965] Lustre: 43164:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 4758 previous similar messages [61202.063594] Lustre: 43432:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [61202.075247] Lustre: 43432:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 14456 previous similar messages [61266.230074] Lustre: 43242:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [61266.241723] Lustre: 43242:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 25655 previous similar messages [61374.103846] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [61374.114026] Lustre: Skipped 6 previous similar messages [61454.080745] Lustre: fir-MDT0002: haven't heard from client b80dc074-a5c6-d02a-f4bd-2a582232f7f1 (at 10.8.27.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a73a580f000, cur 1572893827 expire 1572893677 last 1572893600 [61454.102365] Lustre: Skipped 5 previous similar messages [61771.264210] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [61771.274647] Lustre: Skipped 8 previous similar messages [62222.860396] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [62222.870567] Lustre: Skipped 5 previous similar messages [62299.098506] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a78fa8da000, cur 1572894672 expire 1572894522 last 1572894445 [62299.120295] Lustre: Skipped 6 previous similar messages [62373.391885] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [62373.402325] Lustre: Skipped 7 previous similar messages [62824.987498] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [62824.997673] Lustre: Skipped 6 previous similar messages [63055.043308] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [63055.053743] Lustre: Skipped 11 previous similar messages [63156.119868] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a78b48dc800, cur 1572895529 expire 1572895379 last 1572895302 [63156.141661] Lustre: Skipped 5 previous similar messages [63602.735493] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [63602.745668] Lustre: Skipped 4 previous similar messages [63753.267954] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [63753.278392] Lustre: Skipped 9 previous similar messages [63980.149104] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8c7bf58c00, cur 1572896353 expire 1572896203 last 1572896126 [63980.170897] Lustre: Skipped 5 previous similar messages [64204.863216] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [64204.873389] Lustre: Skipped 4 previous similar messages [64355.395579] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [64355.406017] Lustre: Skipped 8 previous similar messages [64582.160068] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7395956000, cur 1572896955 expire 1572896805 last 1572896728 [64582.181888] Lustre: Skipped 3 previous similar messages [64986.868336] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [64986.878514] Lustre: Skipped 6 previous similar messages [64986.883785] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [64986.894232] Lustre: Skipped 10 previous similar messages [65364.190945] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8ec72c3800, cur 1572897737 expire 1572897587 last 1572897510 [65364.212753] Lustre: Skipped 5 previous similar messages [65588.995347] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [65589.005527] Lustre: Skipped 4 previous similar messages [65589.010789] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [65589.021247] Lustre: Skipped 8 previous similar messages [66191.122433] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [66191.132608] Lustre: Skipped 4 previous similar messages [66191.137860] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [66191.148330] Lustre: Skipped 8 previous similar messages [66217.197475] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5d19b22400, cur 1572898590 expire 1572898440 last 1572898363 [66217.219289] Lustre: Skipped 11 previous similar messages [66818.338160] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [66818.348595] Lustre: Skipped 9 previous similar messages [66964.612144] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [66964.622320] Lustre: Skipped 5 previous similar messages [67045.231544] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6b843d9400, cur 1572899418 expire 1572899268 last 1572899191 [67045.253362] Lustre: Skipped 5 previous similar messages [67420.465134] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [67420.475589] Lustre: Skipped 10 previous similar messages [67566.738766] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [67566.748938] Lustre: Skipped 3 previous similar messages [67898.245336] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8e266db800, cur 1572900271 expire 1572900121 last 1572900044 [67898.267150] Lustre: Skipped 5 previous similar messages [68031.908015] Lustre: fir-MDT0002: Connection restored to (at 10.9.114.1@o2ib4) [68031.915247] Lustre: Skipped 28 previous similar messages [68088.164274] Lustre: 43358:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572900453/real 1572900453] req@ffff9a787469ad00 x1649241991548544/t0(0) o104->fir-MDT0002@10.9.117.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572900460 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [68088.191608] Lustre: 43358:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [68095.197449] Lustre: 43353:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572900460/real 1572900460] req@ffff9a619e52d580 x1649241991548784/t0(0) o104->fir-MDT0002@10.9.117.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572900467 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [68095.224808] Lustre: 43353:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [68103.528660] Lustre: 43389:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572900469/real 1572900469] req@ffff9a9107053600 x1649241991565104/t0(0) o104->fir-MDT0002@10.9.117.26@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572900476 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [68103.556046] Lustre: 43389:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [68104.248644] LustreError: 43358:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.117.35@o2ib4) failed to reply to blocking AST (req@ffff9a787469ad00 x1649241991548544 status 0 rc -5), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a7e074ab600/0x3428b9d2f91cb0ce lrc: 4/0,0 mode: PR/PR res: [0x2c0032cd8:0xb4:0x0].0x0 bits 0x1b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.9.117.35@o2ib4 remote: 0x9e459b7e52acf4ee expref: 133 pid: 43353 timeout: 68249 lvb_type: 0 [68104.248648] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.117.35@o2ib4 was evicted due to a lock blocking callback time out: rc -5 [68104.249484] LustreError: 40361:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 8s: evicting client at 10.9.116.13@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a8cbe970fc0/0x3428b9d2e605e90f lrc: 3/0,0 mode: PW/PW res: [0x2c00323be:0x7f:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.116.13@o2ib4 remote: 0x1e59fafd3fb921a expref: 112 pid: 43362 timeout: 0 lvb_type: 0 [68104.340753] LustreError: 43358:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 6 previous similar messages [68126.327231] Lustre: 43269:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572900492/real 1572900492] req@ffff9a5dff761b00 x1649241991645104/t0(0) o104->fir-MDT0002@10.9.116.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572900499 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [68126.354592] Lustre: 43269:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [68161.365096] Lustre: 43269:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572900527/real 1572900527] req@ffff9a5dff761b00 x1649241991645104/t0(0) o104->fir-MDT0002@10.9.116.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572900534 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [68161.392451] Lustre: 43269:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [68180.250646] LustreError: 43269:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.116.15@o2ib4) failed to reply to blocking AST (req@ffff9a5dff761b00 x1649241991645104 status 0 rc -5), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a7ecf340240/0x3428b9d2f8c7a045 lrc: 4/0,0 mode: PR/PR res: [0x2c0031bfb:0xa8bb:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.116.15@o2ib4 remote: 0x98f5630e775298bf expref: 52 pid: 43531 timeout: 68322 lvb_type: 0 [68180.250809] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.106.72@o2ib4 was evicted due to a lock blocking callback time out: rc -5 [68180.250811] LustreError: Skipped 6 previous similar messages [68180.311370] LustreError: 43269:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message [68269.220305] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [68269.230486] Lustre: Skipped 6 previous similar messages [68621.259119] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5fee462800, cur 1572900994 expire 1572900844 last 1572900767 [68621.280917] Lustre: Skipped 69 previous similar messages [68645.550367] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [68645.560802] Lustre: Skipped 12 previous similar messages [68950.871346] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [68950.881518] Lustre: Skipped 4 previous similar messages [69247.677749] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [69247.688185] Lustre: Skipped 8 previous similar messages [69328.276946] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8fd1202400, cur 1572901701 expire 1572901551 last 1572901474 [69328.298734] Lustre: Skipped 4 previous similar messages [69552.998445] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [69553.008616] Lustre: Skipped 4 previous similar messages [69717.092975] LustreError: 43307:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 299034: error -110 [69726.637215] LustreError: 43507:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 329911: error -110 [69747.104725] LustreError: 43307:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 299034: error -110 [69756.875971] LustreError: 43507:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 329911: error -110 [69859.868353] Lustre: fir-MDT0002: Connection restored to 59e8db77-b4dd-5b19-e958-27852ef95626 (at 10.9.106.48@o2ib4) [69859.878792] Lustre: Skipped 131 previous similar messages [70051.301584] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5c1d186800, cur 1572902424 expire 1572902274 last 1572902197 [70051.323487] Lustre: Skipped 8 previous similar messages [70201.044606] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [70201.054784] Lustre: Skipped 5 previous similar messages [70477.019802] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [70477.030239] Lustre: Skipped 15 previous similar messages [70707.326416] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6dfab88800, cur 1572903080 expire 1572902930 last 1572902853 [70707.348207] Lustre: Skipped 4 previous similar messages [70932.872380] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [70932.882565] Lustre: Skipped 4 previous similar messages [71079.147988] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [71079.158425] Lustre: Skipped 11 previous similar messages [71406.331725] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a850c871000, cur 1572903779 expire 1572903629 last 1572903552 [71406.353537] Lustre: Skipped 8 previous similar messages [71631.096577] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [71631.106759] Lustre: Skipped 6 previous similar messages [71735.708236] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [71735.718671] Lustre: Skipped 11 previous similar messages [72087.362149] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5a3e6cfc00, cur 1572904460 expire 1572904310 last 1572904233 [72087.383961] Lustre: Skipped 4 previous similar messages [72234.471103] Lustre: 43287:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572904600/real 1572904600] req@ffff9a7c89363a80 x1649242005684640/t0(0) o106->fir-MDT0002@10.9.110.33@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1572904607 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [72234.498442] Lustre: 43287:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [72248.335450] Lustre: 40793:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572904613/real 1572904613] req@ffff9a5d02e01b00 x1649242005709984/t0(0) o106->fir-MDT0002@10.9.110.33@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1572904620 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [72248.362791] Lustre: 40793:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [72265.662879] Lustre: 43420:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572904631/real 1572904631] req@ffff9a5e8cb60d80 x1649242005843088/t0(0) o106->fir-MDT0002@10.9.110.33@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1572904638 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [72265.690214] Lustre: 43420:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 7 previous similar messages [72300.701770] Lustre: 43420:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572904666/real 1572904666] req@ffff9a5e8cb60d80 x1649242005843088/t0(0) o106->fir-MDT0002@10.9.110.33@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1572904673 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [72300.729111] Lustre: 43420:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 19 previous similar messages [72312.746399] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [72312.756578] Lustre: Skipped 4 previous similar messages [72365.359108] LustreError: 40793:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.110.33@o2ib4) failed to reply to glimpse AST (req@ffff9a5d02e01b00 x1649242005709984 status 0 rc -5), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a5c5bb4e300/0x3428b9d2fb2a4ac0 lrc: 6/0,0 mode: PW/PW res: [0x2c0031430:0x4708:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x40200000000000 nid: 10.9.110.33@o2ib4 remote: 0xc3d1967e71d81e9d expref: 52 pid: 43271 timeout: 0 lvb_type: 0 [72365.359114] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.110.33@o2ib4 was evicted due to a lock glimpse callback time out: rc -5 [72365.359116] LustreError: Skipped 2 previous similar messages [72365.359133] LustreError: 40361:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 213s: evicting client at 10.9.110.33@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a715a1ec800/0x3428b9d2fc41a6ef lrc: 3/0,0 mode: PW/PW res: [0x2c0032284:0xdf3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.9.110.33@o2ib4 remote: 0xc3d1967e71d873d1 expref: 49 pid: 43379 timeout: 0 lvb_type: 0 [72365.456763] LustreError: 40793:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 3 previous similar messages [72459.020797] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [72459.031234] Lustre: Skipped 16 previous similar messages [72689.383881] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61a3ab2800, cur 1572905062 expire 1572904912 last 1572904835 [72689.405689] Lustre: Skipped 6 previous similar messages [72914.873339] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [72914.883513] Lustre: Skipped 4 previous similar messages [73111.324500] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [73111.334933] Lustre: Skipped 11 previous similar messages [73438.381755] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5a8fab6400, cur 1572905811 expire 1572905661 last 1572905584 [73438.403560] Lustre: Skipped 7 previous similar messages [73588.008734] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [73588.018907] Lustre: Skipped 5 previous similar messages [73713.453851] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [73713.464283] Lustre: Skipped 12 previous similar messages [74050.392919] Lustre: fir-MDT0002: haven't heard from client 61df7307-81a0-565b-df06-b67958ebb2c9 (at 10.8.20.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a575b59e800, cur 1572906423 expire 1572906273 last 1572906196 [74050.414542] Lustre: Skipped 5 previous similar messages [74294.746967] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [74294.757146] Lustre: Skipped 4 previous similar messages [74346.059833] Lustre: fir-MDT0002: Connection restored to a7ba760e-c35d-9ca2-cc34-86a4871c77f0 (at 10.9.112.2@o2ib4) [74346.070189] Lustre: Skipped 11 previous similar messages [74671.408711] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6ba770f800, cur 1572907044 expire 1572906894 last 1572906817 [74671.430522] Lustre: Skipped 4 previous similar messages [74997.228263] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [74997.238697] Lustre: Skipped 10 previous similar messages [75122.671290] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [75122.681468] Lustre: Skipped 6 previous similar messages [75424.434994] Lustre: fir-MDT0002: haven't heard from client 9b355e99-e1d7-dac2-f8b4-b4d40cfe0b74 (at 10.9.112.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7dacac5c00, cur 1572907797 expire 1572907647 last 1572907570 [75424.456704] Lustre: Skipped 7 previous similar messages [75670.364114] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [75670.374544] Lustre: Skipped 11 previous similar messages [75971.426522] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [75971.436700] Lustre: Skipped 5 previous similar messages [76047.445440] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6df1d85800, cur 1572908420 expire 1572908270 last 1572908193 [76047.467247] Lustre: Skipped 14 previous similar messages [76313.868870] Lustre: fir-MDT0002: Connection restored to 1c6c3891-ae23-dfb8-944b-430147800558 (at 10.8.17.24@o2ib6) [76313.879221] Lustre: Skipped 11 previous similar messages [76323.944390] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.15@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [76323.961783] LustreError: Skipped 220 previous similar messages [76369.997030] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.17.24@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [76413.006028] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.10@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [76470.351201] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.17.24@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [76470.368502] LustreError: Skipped 1 previous similar message [76570.705841] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.17.24@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [76570.723124] LustreError: Skipped 2 previous similar messages [76577.810785] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [76577.820962] Lustre: Skipped 5 previous similar messages [76649.459480] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a854907b400, cur 1572909022 expire 1572908872 last 1572908795 [76649.481273] Lustre: Skipped 3 previous similar messages [76703.940889] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.108.25@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [76703.958260] LustreError: Skipped 6 previous similar messages [76924.794257] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [76924.804711] Lustre: Skipped 14 previous similar messages [76964.955567] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.10@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [76964.972935] LustreError: Skipped 16 previous similar messages [77179.937499] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [77179.947669] Lustre: Skipped 3 previous similar messages [77255.472667] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90f82f2000, cur 1572909628 expire 1572909478 last 1572909401 [77255.494459] Lustre: Skipped 14 previous similar messages [77491.812367] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.103.53@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [77491.829746] LustreError: Skipped 52 previous similar messages [77577.097266] Lustre: fir-MDT0002: Connection restored to abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) [77577.107706] Lustre: Skipped 13 previous similar messages [77962.130730] Lustre: fir-MDT0002: Client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) reconnecting [77962.140910] Lustre: Skipped 3 previous similar messages [77987.498331] Lustre: fir-MDT0002: haven't heard from client abb5ac98-d884-746d-5bad-e1a980f92130 (at 10.9.110.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7799de3c00, cur 1572910360 expire 1572910210 last 1572910133 [77987.520132] Lustre: Skipped 4 previous similar messages [78076.961238] Lustre: 39606:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572910442/real 1572910442] req@ffff9a81a913a880 x1649242031906160/t0(0) o41->fir-MDT0000-osp-MDT0002@10.0.10.51@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1572910449 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [78076.989362] Lustre: 39606:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 36 previous similar messages [78076.999195] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [78100.983369] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.10@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [78101.000734] LustreError: Skipped 64 previous similar messages [78108.965015] Lustre: 39607:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572910474/real 1572910474] req@ffff9a5b26f6e300 x1649242032065856/t0(0) o400->MGC10.0.10.51@o2ib7@10.0.10.51@o2ib7:26/25 lens 224/224 e 0 to 1 dl 1572910481 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [78108.993042] Lustre: 39607:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [78109.002791] LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail [78139.812763] LNetError: 39555:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds [78139.822767] LNetError: 39555:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.51@o2ib7 (30): c: 0, oc: 0, rc: 8 [78139.835256] Lustre: 39614:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1572910506/real 1572910512] req@ffff9a7d6cc14c80 x1649242032192160/t0(0) o400->fir-MDT0000-lwp-MDT0002@10.0.10.51@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1572911262 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 [78139.864011] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [78145.812924] LNet: 39555:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 0 seconds [78170.813536] LNet: 39555:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 0 seconds [78190.522205] Lustre: fir-MDT0002: Connection restored to 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) [78190.532645] Lustre: Skipped 12 previous similar messages [78196.814162] LNet: 39555:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 1 seconds [78220.814748] LNet: 39555:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 0 seconds [78245.815360] LNet: 39555:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 0 seconds [78270.815973] LNet: 39555:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 0 seconds [78292.816508] LNet: 39555:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 1 seconds [78341.817703] LNet: 39555:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 0 seconds [78341.827789] LNet: 39555:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [78417.657566] LNet: Service thread pid 43538 was inactive for 200.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [78417.674503] Pid: 43538, comm: mdt03_110 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [78417.684678] Call Trace: [78417.687143] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [78417.693751] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [78417.700440] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [78417.707117] [] osp_md_object_lock+0x162/0x2d0 [osp] [78417.713704] [] lod_object_lock+0xf3/0x7b0 [lod] [78417.719926] [] mdd_object_lock+0x3e/0xe0 [mdd] [78417.726074] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [78417.733342] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [78417.740100] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [78417.746325] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [78417.752836] [] mdt_reint_rec+0x83/0x210 [mdt] [78417.758891] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [78417.765475] [] mdt_reint+0x67/0x140 [mdt] [78417.771182] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [78417.778146] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [78417.785864] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [78417.792203] [] kthread+0xd1/0xe0 [78417.797121] [] ret_from_fork_nospec_begin+0xe/0x21 [78417.803609] [] 0xffffffffffffffff [78417.808634] LustreError: dumping log to /tmp/lustre-log.1572910790.43538 [78588.518218] Lustre: fir-MDT0002: haven't heard from client e0ff263f-ca42-9482-8c10-2958dcd9c2d5 (at 10.9.106.58@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a81be882400, cur 1572910961 expire 1572910811 last 1572910734 [78588.540010] Lustre: Skipped 5 previous similar messages [78591.940189] Lustre: fir-MDT0002: Client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) reconnecting [78591.950368] Lustre: Skipped 5 previous similar messages [78641.536195] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [78766.978286] Lustre: Evicted from MGS (at MGC10.0.10.51@o2ib7_0) after server handle changed from 0x1306fc8de5f52567 to 0x675682d7aa2e47b [78793.178380] Lustre: fir-MDT0002: Connection restored to 1d70be10-e716-cc45-45aa-8ec410badfac (at 10.9.101.24@o2ib4) [78793.188818] Lustre: Skipped 13 previous similar messages [78811.941310] Lustre: 43302:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9a910ddbad00 x1649305600797296/t0(0) o36->55060afd-312a-d8ec-7bf7-df48ba803f79@10.9.114.1@o2ib4:59/0 lens 552/2888 e 24 to 0 dl 1572911189 ref 2 fl Interpret:/0/0 rc 0/0 [78918.508539] LNet: Service thread pid 43538 completed after 701.15s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [78968.795268] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.101.24@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [78968.812637] LustreError: Skipped 3283 previous similar messages [79182.838393] LNetError: 39555:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [79182.848570] LNetError: 39555:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (106): c: 7, oc: 0, rc: 8 [79201.541496] Lustre: fir-MDT0002: haven't heard from client 151a14fd-eede-c30d-ad14-4e1815cd58c9 (at 10.9.105.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8190e94400, cur 1572911574 expire 1572911424 last 1572911347 [79201.563303] Lustre: Skipped 1 previous similar message [79645.163933] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.101.24@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [79645.181322] LustreError: Skipped 124 previous similar messages [79767.482386] Lustre: fir-MDT0002: Connection restored to df50fab3-7c43-062d-d7be-7d20335b8a0f (at 10.9.108.35@o2ib4) [79767.492825] Lustre: Skipped 14 previous similar messages [79815.392607] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation out_update to node 10.0.10.51@o2ib7 failed: rc = -19 [79815.403303] LustreError: Skipped 533 previous similar messages [79815.409165] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [79820.700131] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [80091.938612] LNet: Service thread pid 43371 was inactive for 200.53s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [80091.955546] Pid: 43371, comm: mdt01_069 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [80091.965721] Call Trace: [80091.968204] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [80091.974798] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [80091.981486] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [80091.988155] [] osp_md_object_lock+0x162/0x2d0 [osp] [80091.994737] [] lod_object_lock+0xf3/0x7b0 [lod] [80092.000962] [] mdd_object_lock+0x3e/0xe0 [mdd] [80092.007114] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [80092.014381] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [80092.021140] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [80092.027378] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [80092.033877] [] mdt_reint_rec+0x83/0x210 [mdt] [80092.039946] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [80092.046537] [] mdt_reint+0x67/0x140 [mdt] [80092.052238] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [80092.059193] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [80092.066930] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [80092.073278] [] kthread+0xd1/0xe0 [80092.078194] [] ret_from_fork_nospec_begin+0xe/0x21 [80092.084683] [] 0xffffffffffffffff [80092.089705] LustreError: dumping log to /tmp/lustre-log.1572912464.43371 [80121.764471] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [80259.197781] LNet: Service thread pid 43371 completed after 367.78s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [80284.545549] Lustre: fir-MDT0002: haven't heard from client 56be126d-7814-d380-b324-3068e32f3167 (at 10.9.101.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a73fd069800, cur 1572912657 expire 1572912507 last 1572912430 [80284.567344] Lustre: Skipped 3 previous similar messages [81024.206798] Lustre: fir-MDT0002: Connection restored to 151a14fd-eede-c30d-ad14-4e1815cd58c9 (at 10.9.105.15@o2ib4) [81024.217232] Lustre: Skipped 5 previous similar messages [81054.584973] Lustre: fir-MDT0002: haven't heard from client c20b1107-8bbf-11f6-f0e1-96c356a905b8 (at 10.9.110.63@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a71b6e99400, cur 1572913427 expire 1572913277 last 1572913200 [81347.232232] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation ldlm_enqueue to node 10.0.10.51@o2ib7 failed: rc = -19 [81347.243109] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [81351.105589] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [81393.315295] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.101.39@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [81393.332680] LustreError: Skipped 2788 previous similar messages [81436.729071] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.51@o2ib7) [81436.736306] Lustre: Skipped 1 previous similar message [81451.460102] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [81469.599324] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.0.10.105@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [81469.616695] LustreError: Skipped 1153 previous similar messages [81484.917784] LustreError: 40361:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.105.8@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a80666bba80/0x3428b9d2f6aa3184 lrc: 3/0,0 mode: PW/PW res: [0x2c0032e31:0x55:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.9.105.8@o2ib4 remote: 0xc8d327232167c750 expref: 7 pid: 40869 timeout: 81482 lvb_type: 0 [81484.955176] LustreError: 40361:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages [81547.590318] LNet: Service thread pid 40869 was inactive for 200.35s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [81547.607255] Pid: 40869, comm: mdt02_006 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [81547.617432] Call Trace: [81547.619901] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [81547.626527] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [81547.633235] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [81547.639938] [] osp_md_object_lock+0x162/0x2d0 [osp] [81547.646521] [] lod_object_lock+0xf3/0x7b0 [lod] [81547.652745] [] mdd_object_lock+0x3e/0xe0 [mdd] [81547.658897] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [81547.666162] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [81547.672918] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [81547.679145] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [81547.685641] [] mdt_reint_rec+0x83/0x210 [mdt] [81547.691694] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [81547.698280] [] mdt_reint+0x67/0x140 [mdt] [81547.703997] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [81547.710957] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [81547.718676] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [81547.725023] [] kthread+0xd1/0xe0 [81547.729940] [] ret_from_fork_nospec_begin+0xe/0x21 [81547.736430] [] 0xffffffffffffffff [81547.741453] LustreError: dumping log to /tmp/lustre-log.1572913920.40869 [81553.222454] LNet: Service thread pid 43347 was inactive for 200.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [81553.239389] Pid: 43347, comm: mdt02_061 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [81553.249562] Call Trace: [81553.252031] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [81553.258642] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [81553.265331] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [81553.272008] [] osp_md_object_lock+0x162/0x2d0 [osp] [81553.278592] [] lod_object_lock+0xf3/0x7b0 [lod] [81553.284815] [] mdd_object_lock+0x3e/0xe0 [mdd] [81553.290959] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [81553.298242] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [81553.305000] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [81553.311225] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [81553.317722] [] mdt_reint_rec+0x83/0x210 [mdt] [81553.323773] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [81553.330349] [] mdt_reint+0x67/0x140 [mdt] [81553.336055] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [81553.343020] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [81553.350737] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [81553.357077] [] kthread+0xd1/0xe0 [81553.361996] [] ret_from_fork_nospec_begin+0xe/0x21 [81553.368497] [] 0xffffffffffffffff [81553.373516] LustreError: dumping log to /tmp/lustre-log.1572913925.43347 [81592.062241] Lustre: fir-MDT0000-osp-MDT0002: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) [81592.071905] Lustre: Skipped 2 previous similar messages [81592.078755] LNet: Service thread pid 40869 completed after 244.84s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [81592.094950] LNet: Skipped 1 previous similar message [81895.927917] LustreError: 40361:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.109.4@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a73d1308480/0x3428b9d2225ff726 lrc: 3/0,0 mode: PW/PW res: [0x2c0031428:0x2f3:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.9.109.4@o2ib4 remote: 0xa6c52e7b39451e6b expref: 17 pid: 43317 timeout: 81893 lvb_type: 0 [81919.850887] Lustre: fir-MDT0002: Connection restored to 93405e85-095c-0cea-9806-4f996fb817fa (at 10.9.109.4@o2ib4) [82146.595093] Lustre: fir-MDT0002: haven't heard from client 93405e85-095c-0cea-9806-4f996fb817fa (at 10.9.109.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a51843e9c00, cur 1572914519 expire 1572914369 last 1572914292 [82146.616800] Lustre: Skipped 10 previous similar messages [82704.455018] Lustre: fir-MDT0002: Connection restored to b03fd709-6fe9-7b0d-2088-8c53033528b4 (at 10.9.115.8@o2ib4) [82704.465373] Lustre: Skipped 6 previous similar messages [82956.611592] Lustre: fir-MDT0002: haven't heard from client f3e3148e-c3d9-1e95-ad7b-f31e31645f11 (at 10.9.105.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a7f84800, cur 1572915329 expire 1572915179 last 1572915102 [82956.633301] Lustre: Skipped 1 previous similar message [83489.006083] Lustre: fir-MDT0002: Connection restored to 151a14fd-eede-c30d-ad14-4e1815cd58c9 (at 10.9.105.15@o2ib4) [83489.016515] Lustre: Skipped 8 previous similar messages [83539.133126] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.105.15@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [83539.150511] LustreError: Skipped 7 previous similar messages [83580.640164] Lustre: fir-MDT0002: haven't heard from client d84bfd80-9170-84b3-0a17-3f86b551fab7 (at 10.9.107.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90f6e9f400, cur 1572915953 expire 1572915803 last 1572915726 [83580.661867] Lustre: Skipped 6 previous similar messages [83639.487573] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.105.15@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [83739.842297] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.105.15@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [84467.610166] Lustre: fir-MDT0002: Connection restored to 56be126d-7814-d380-b324-3068e32f3167 (at 10.9.101.22@o2ib4) [84543.650661] Lustre: fir-MDT0002: haven't heard from client 09d5976f-36eb-8525-5fb7-c49d065b6a22 (at 10.9.117.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8e1ec78400, cur 1572916916 expire 1572916766 last 1572916689 [84543.672368] Lustre: Skipped 12 previous similar messages [84798.741115] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.115.10@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [84845.048243] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.110.63@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [84887.234269] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.26.4@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [84932.629995] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.108.48@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [84932.647364] LustreError: Skipped 1 previous similar message [85032.986783] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.108.48@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [85033.004155] LustreError: Skipped 2 previous similar messages [85125.976365] Lustre: fir-MDT0002: Connection restored to b830c0f5-f2ea-37a2-9dec-4618814c17af (at 10.8.18.15@o2ib6) [85125.986712] Lustre: Skipped 15 previous similar messages [85251.325497] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.103.72@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [85251.342867] LustreError: Skipped 2 previous similar messages [85608.693536] Lustre: fir-MDT0002: haven't heard from client 18f3424c-7cce-31b6-0c78-e6e7e21ffd78 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a74f4cd6800, cur 1572917981 expire 1572917831 last 1572917754 [85656.242826] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.113.14@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [85656.260198] LustreError: Skipped 4 previous similar messages [85707.562524] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation ldlm_enqueue to node 10.0.10.51@o2ib7 failed: rc = -19 [85707.573396] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [85716.528236] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [85781.696790] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [85781.703934] Lustre: Skipped 7 previous similar messages [85907.893082] LNet: Service thread pid 43499 was inactive for 200.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [85907.910016] Pid: 43499, comm: mdt01_103 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [85907.920192] Call Trace: [85907.922666] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [85907.929263] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [85907.935952] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [85907.942628] [] osp_md_object_lock+0x162/0x2d0 [osp] [85907.949213] [] lod_object_lock+0xf3/0x7b0 [lod] [85907.955438] [] mdd_object_lock+0x3e/0xe0 [mdd] [85907.961588] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [85907.968855] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [85907.975612] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [85907.981853] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [85907.988350] [] mdt_reint_rec+0x83/0x210 [mdt] [85907.994405] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [85908.000988] [] mdt_reint+0x67/0x140 [mdt] [85908.006694] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [85908.013667] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [85908.021387] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [85908.027734] [] kthread+0xd1/0xe0 [85908.032650] [] ret_from_fork_nospec_begin+0xe/0x21 [85908.039141] [] 0xffffffffffffffff [85908.044177] LustreError: dumping log to /tmp/lustre-log.1572918280.43499 [85910.453146] LNet: Service thread pid 43510 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [85910.470080] Pid: 43510, comm: mdt02_093 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [85910.480255] Call Trace: [85910.482725] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [85910.489335] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [85910.496025] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [85910.502699] [] osp_md_object_lock+0x162/0x2d0 [osp] [85910.509284] [] lod_object_lock+0xf3/0x7b0 [lod] [85910.515508] [] mdd_object_lock+0x3e/0xe0 [mdd] [85910.521659] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [85910.528926] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [85910.535681] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [85910.541907] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [85910.548404] [] mdt_reint_rec+0x83/0x210 [mdt] [85910.554456] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [85910.561032] [] mdt_reint+0x67/0x140 [mdt] [85910.566738] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [85910.573702] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [85910.581422] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [85910.587777] [] kthread+0xd1/0xe0 [85910.592695] [] ret_from_fork_nospec_begin+0xe/0x21 [85910.599184] [] 0xffffffffffffffff [85910.604207] LustreError: dumping log to /tmp/lustre-log.1572918282.43510 [85910.965162] Pid: 43371, comm: mdt01_069 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [85910.975332] Call Trace: [85910.977794] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [85910.984397] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [85910.991087] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [85910.997761] [] osp_md_object_lock+0x162/0x2d0 [osp] [85911.004345] [] lod_object_lock+0xf3/0x7b0 [lod] [85911.010572] [] mdd_object_lock+0x3e/0xe0 [mdd] [85911.016724] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [85911.023988] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [85911.030747] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [85911.036985] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [85911.043485] [] mdt_reint_rec+0x83/0x210 [mdt] [85911.049554] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [85911.056141] [] mdt_reint+0x67/0x140 [mdt] [85911.061837] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [85911.068807] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [85911.076517] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [85911.082860] [] kthread+0xd1/0xe0 [85911.087777] [] ret_from_fork_nospec_begin+0xe/0x21 [85911.094267] [] 0xffffffffffffffff [85911.099294] LustreError: dumping log to /tmp/lustre-log.1572918283.43371 [85912.501202] LNet: Service thread pid 43163 was inactive for 200.55s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [85912.518135] LNet: Skipped 1 previous similar message [85912.523109] Pid: 43163, comm: mdt01_023 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [85912.533298] Call Trace: [85912.535756] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [85912.542346] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [85912.549035] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [85912.555703] [] osp_md_object_lock+0x162/0x2d0 [osp] [85912.562278] [] lod_object_lock+0xf3/0x7b0 [lod] [85912.568493] [] mdd_object_lock+0x3e/0xe0 [mdd] [85912.574638] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [85912.581910] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [85912.588669] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [85912.594896] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [85912.601398] [] mdt_reint_rec+0x83/0x210 [mdt] [85912.607445] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [85912.614021] [] mdt_reint+0x67/0x140 [mdt] [85912.619724] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [85912.626683] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [85912.634399] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [85912.640739] [] kthread+0xd1/0xe0 [85912.645656] [] ret_from_fork_nospec_begin+0xe/0x21 [85912.652130] [] 0xffffffffffffffff [85912.657153] LustreError: dumping log to /tmp/lustre-log.1572918284.43163 [85917.237477] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [86049.244034] LNet: Service thread pid 43371 completed after 338.78s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [86049.260197] LNet: Skipped 1 previous similar message [86321.362088] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.110.20@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [86321.379463] LustreError: Skipped 2639 previous similar messages [86441.703310] Lustre: fir-MDT0002: haven't heard from client 12db1734-6c11-a510-5911-1f8778c71046 (at 10.9.105.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a8075000, cur 1572918814 expire 1572918664 last 1572918587 [86441.725014] Lustre: Skipped 2 previous similar messages [86863.141661] Lustre: fir-MDT0002: Connection restored to a6bc548c-0163-ffd4-f1e2-918d758acc26 (at 10.9.114.8@o2ib4) [86863.152011] Lustre: Skipped 5 previous similar messages [86923.485864] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.110.20@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [86923.503231] LustreError: Skipped 10 previous similar messages [87203.544696] Lustre: 43289:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572919568/real 1572919568] req@ffff9a61b5738d80 x1649242203640048/t0(0) o104->fir-MDT0002@10.9.101.57@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572919575 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [87210.571890] Lustre: 43289:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572919575/real 1572919575] req@ffff9a61b5738d80 x1649242203640048/t0(0) o104->fir-MDT0002@10.9.101.57@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572919582 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [87224.599269] Lustre: 43289:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572919589/real 1572919589] req@ffff9a61b5738d80 x1649242203640048/t0(0) o104->fir-MDT0002@10.9.101.57@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572919596 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [87224.626601] Lustre: 43289:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [87245.637836] Lustre: 43289:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572919610/real 1572919610] req@ffff9a61b5738d80 x1649242203640048/t0(0) o104->fir-MDT0002@10.9.101.57@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572919617 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [87245.665190] Lustre: 43289:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [87280.675794] Lustre: 43289:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572919645/real 1572919645] req@ffff9a61b5738d80 x1649242203640048/t0(0) o104->fir-MDT0002@10.9.101.57@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1572919652 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [87280.703124] Lustre: 43289:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [87290.724095] Lustre: fir-MDT0002: haven't heard from client ba4a5ec4-9157-df0d-908f-480d06361292 (at 10.9.116.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a610d25c400, cur 1572919663 expire 1572919513 last 1572919436 [87290.745825] Lustre: Skipped 3 previous similar messages [87322.719330] LustreError: 43289:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.101.57@o2ib4) returned error from blocking AST (req@ffff9a61b5738d80 x1649242203640048 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a77670de300/0x3428b9d3053b9ba0 lrc: 4/0,0 mode: PR/PR res: [0x2c0032254:0x3d5:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.101.57@o2ib4 remote: 0xc88e2d0e2677cf6f expref: 26 pid: 43347 timeout: 87469 lvb_type: 0 [87322.762361] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.101.57@o2ib4 was evicted due to a lock blocking callback time out: rc -107 [87322.774966] LustreError: Skipped 2 previous similar messages [87322.780650] LustreError: 40361:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 127s: evicting client at 10.9.101.57@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a77670de300/0x3428b9d3053b9ba0 lrc: 3/0,0 mode: PR/PR res: [0x2c0032254:0x3d5:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.101.57@o2ib4 remote: 0xc88e2d0e2677cf6f expref: 27 pid: 43347 timeout: 0 lvb_type: 0 [87529.105154] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.116.2@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [87529.122438] LustreError: Skipped 18 previous similar messages [87570.113607] Lustre: fir-MDT0002: Connection restored to d8e7f814-ff12-2702-15d9-d4b9584d9335 (at 10.9.112.12@o2ib4) [87570.124039] Lustre: Skipped 2 previous similar messages [88024.686544] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [88027.454598] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_statfs to node 10.0.10.51@o2ib7 failed: rc = -107 [88027.465381] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [88125.042354] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [88142.339582] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.0.10.110@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [88142.356958] LustreError: Skipped 842 previous similar messages [88237.048677] Lustre: fir-MDT0002: Connection restored to (at 10.9.105.2@o2ib4) [88237.055904] Lustre: Skipped 4 previous similar messages [88322.150723] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_statfs to node 10.0.10.51@o2ib7 failed: rc = -19 [88322.161427] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [88350.839572] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [88364.769747] Lustre: fir-MDT0002: haven't heard from client 4acbf1ac-22d9-9ec6-7b26-c0b1f503fe3a (at 10.9.108.46@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a81a0262000, cur 1572920737 expire 1572920587 last 1572920510 [88364.791535] Lustre: Skipped 3 previous similar messages [88426.105585] LustreError: 167-0: fir-MDT0000-osp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [88426.119048] LustreError: 43353:0:(mdt_reint.c:2333:mdt_reint_rename()) fir-MDT0002: can't lock FS for rename: rc = -5 [88426.129675] LustreError: 43353:0:(mdt_reint.c:2333:mdt_reint_rename()) Skipped 3 previous similar messages [88750.440602] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.13@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [88750.457969] LustreError: Skipped 719 previous similar messages [89016.317370] Lustre: fir-MDT0002: Connection restored to 9cb21e95-4ca8-e0e4-d260-7d48045f8e7a (at 10.9.101.51@o2ib4) [89016.327802] Lustre: Skipped 5 previous similar messages [89081.274761] Lustre: 39608:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572920697/real 1572920697] req@ffff9a7869580480 x1649242205795024/t0(0) o400->fir-MDT0000-lwp-MDT0002@0@lo:12/10 lens 224/224 e 0 to 1 dl 1572921453 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [89081.302099] Lustre: 39608:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [89385.875902] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.101.51@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [89385.893268] LustreError: Skipped 233 previous similar messages [89670.267943] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.50@o2ib4) [89993.365780] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.0.10.107@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [89993.383147] LustreError: Skipped 204 previous similar messages [90024.830457] Lustre: fir-MDT0002: haven't heard from client 2562dafb-d1e2-8fd0-7e69-88908768a7d5 (at 10.8.18.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a81b2ecac00, cur 1572922397 expire 1572922247 last 1572922170 [90024.852179] Lustre: Skipped 3 previous similar messages [90100.810292] Lustre: fir-MDT0002: haven't heard from client 42006178-6fc8-8ec7-6930-1fcfc41282e8 (at 10.9.114.4@o2ib4) in 158 seconds. I think it's dead, and I am evicting it. exp ffff9a6cf578a000, cur 1572922473 expire 1572922323 last 1572922315 [90299.597341] Lustre: fir-MDT0002: Connection restored to 5da366be-1a01-1138-b078-67dfc6e741be (at 10.9.114.11@o2ib4) [90299.607774] Lustre: Skipped 6 previous similar messages [90471.310517] Lustre: Failing over fir-MDT0002 [90471.376271] Lustre: fir-MDT0002: Not available for connect from 10.9.106.8@o2ib4 (stopping) [90471.384657] Lustre: Skipped 1 previous similar message [90471.437748] LustreError: 11-0: fir-MDT0003-osp-MDT0002: operation mds_disconnect to node 10.0.10.54@o2ib7 failed: rc = -107 [90471.539820] LustreError: 66555:0:(osp_object.c:594:osp_attr_get()) fir-MDT0000-osp-MDT0002:osp_attr_get update error [0x20000000a:0x0:0x0]: rc = -108 [90471.565465] LustreError: 66555:0:(llog_cat.c:444:llog_cat_close()) fir-MDT0000-osp-MDT0002: failure destroying log during cleanup: rc = -108 [90471.890976] Lustre: fir-MDT0002: Not available for connect from 10.9.105.51@o2ib4 (stopping) [90471.899423] Lustre: Skipped 26 previous similar messages [90472.898893] Lustre: fir-MDT0002: Not available for connect from 10.9.105.10@o2ib4 (stopping) [90472.907343] Lustre: Skipped 87 previous similar messages [90473.426675] LustreError: 51822:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.67@o2ib6 arrived at 1572922845 with bad export cookie 3758458201188631527 [90474.644024] LustreError: 45004:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.9.116.7@o2ib4 arrived at 1572922846 with bad export cookie 3758458204839129581 [90474.899439] Lustre: fir-MDT0002: Not available for connect from 10.9.101.39@o2ib4 (stopping) [90474.907882] Lustre: Skipped 127 previous similar messages [90477.554832] LustreError: 45004:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.23.25@o2ib6 arrived at 1572922849 with bad export cookie 3758458201188626809 [90477.570298] LustreError: 45004:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 2 previous similar messages [90478.911743] Lustre: fir-MDT0002: Not available for connect from 10.9.103.13@o2ib4 (stopping) [90478.920182] Lustre: Skipped 223 previous similar messages [90484.332375] LustreError: 45004:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.30.12@o2ib6 arrived at 1572922856 with bad export cookie 3758458201188629637 [90484.347852] LustreError: 45004:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 2 previous similar messages [90487.437399] Lustre: server umount fir-MDT0002 complete [90531.637098] LNetError: 321:0:(o2iblnd_cb.c:2495:kiblnd_passive_connect()) Can't accept conn from 10.0.10.210@o2ib7 on NA (ib0:1:10.0.10.53): bad dst nid 10.0.10.53@o2ib7 [90533.507080] LNet: Removed LNI 10.0.10.53@o2ib7 [90944.679292] LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 [90944.686855] alg: No test for adler32 (adler32-zlib) [90945.514520] Lustre: Lustre: Build Version: 2.12.3_2_gb033996 [90945.660927] LNet: 67041:0:(config.c:1627:lnet_inet_enumerate()) lnet: Ignoring interface em2: it's down [90945.670507] LNet: Using FastReg for registration [90945.687058] LNet: Added LNI 10.0.10.53@o2ib7 [8/256/0/180] [90946.953034] LDISKFS-fs (dm-0): file extents enabled, maximum tree depth=5 [90947.041103] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [90947.777844] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.108.3@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [90947.795149] LustreError: Skipped 2 previous similar messages [90948.296842] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.108.10@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [90948.314231] LustreError: Skipped 29 previous similar messages [90949.297622] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.105.19@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [90949.314987] LustreError: Skipped 49 previous similar messages [90951.337776] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.31.8@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [90951.354976] LustreError: Skipped 108 previous similar messages [90955.427789] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [90955.444993] LustreError: Skipped 160 previous similar messages [90960.693291] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.54@o2ib7 added to recovery queue. Health = 900 [90966.875573] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.108.67@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [90966.892978] LustreError: Skipped 27 previous similar messages [90993.437096] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.101@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [90993.454510] LustreError: Skipped 223 previous similar messages [91005.708176] Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [91005.943147] Lustre: fir-MDD0002: changelog on [91005.955008] Lustre: fir-MDT0002: in recovery but waiting for the first client to connect [91007.106552] Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1285 clients reconnect [91008.115624] Lustre: fir-MDT0002: Connection restored to (at 10.9.113.6@o2ib4) [91010.717008] Lustre: fir-MDT0002: Connection restored to (at 10.9.115.4@o2ib4) [91011.728650] Lustre: fir-MDT0002: Connection restored to 876a76d8-81f4-0ffd-b7d4-b9c13c4f123f (at 10.9.105.4@o2ib4) [91011.739008] Lustre: Skipped 250 previous similar messages [91013.734818] Lustre: fir-MDT0002: Connection restored to (at 10.8.19.8@o2ib6) [91013.741962] Lustre: Skipped 485 previous similar messages [91018.525627] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.101@o2ib7) [91018.532950] Lustre: Skipped 556 previous similar messages [91025.450115] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.104@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [91025.467487] LustreError: Skipped 186 previous similar messages [91027.535956] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.110@o2ib7) [91027.543323] Lustre: Skipped 48 previous similar messages [91070.881228] Lustre: fir-MDT0000-osp-MDT0002: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) [91070.890897] Lustre: Skipped 30 previous similar messages [91104.960893] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.101.51@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [91104.978310] LustreError: Skipped 58 previous similar messages [91109.689183] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.54@o2ib7) [91109.696437] Lustre: Skipped 1 previous similar message [91109.723629] Lustre: fir-MDT0002: Client f0a8ec9b-fbf5-a8d2-cba4-506dafb70319 (at 10.9.110.5@o2ib4) reconnected, waiting for 1285 clients in recovery for 0:47 [91109.771885] Lustre: fir-MDT0002: Recovery over after 1:42, of 1285 clients 1285 recovered and 0 were evicted. [91249.894738] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.114.4@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [91249.912022] LustreError: Skipped 5 previous similar messages [91525.869985] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.114.4@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [91525.887271] LustreError: Skipped 5 previous similar messages [91570.321569] Lustre: fir-MDT0002: Connection restored to fb3cb37b-243b-3f46-8fa2-cf8b4c4ea43c (at 10.8.18.23@o2ib6) [91570.331918] Lustre: Skipped 2 previous similar messages [91867.846150] Lustre: fir-MDT0002: haven't heard from client 184b31b8-d3fb-9ad3-e8c5-15f4ffd4c6a7 (at 10.9.112.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b8f28400, cur 1572924240 expire 1572924090 last 1572924013 [92065.280537] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.18.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [92065.297823] LustreError: Skipped 6 previous similar messages [92171.884706] Lustre: fir-MDT0002: haven't heard from client 61a9f06d-cc82-0724-e0b6-3870bcbc10b0 (at 10.9.108.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a7f84c00, cur 1572924544 expire 1572924394 last 1572924317 [92452.861989] Lustre: fir-MDT0002: haven't heard from client 14a7252d-059e-f55d-9bba-02f9b98a4298 (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bed0d000, cur 1572924825 expire 1572924675 last 1572924598 [92452.883775] Lustre: Skipped 1 previous similar message [92532.864559] Lustre: fir-MDT0002: haven't heard from client f8609ef5-d693-95e9-3517-c217855651fc (at 10.9.101.47@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a7fce000, cur 1572924905 expire 1572924755 last 1572924678 [92720.529147] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.18.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [92720.546449] LustreError: Skipped 4 previous similar messages [92810.898709] Lustre: fir-MDT0002: haven't heard from client b17c7abb-80dc-5fb4-0fbb-a89b79b2c4ce (at 10.9.110.51@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b8f2d800, cur 1572925183 expire 1572925033 last 1572924956 [92810.920505] Lustre: Skipped 1 previous similar message [92950.574697] Lustre: fir-MDT0002: Connection restored to 184b31b8-d3fb-9ad3-e8c5-15f4ffd4c6a7 (at 10.9.112.16@o2ib4) [93275.762166] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.17@o2ib4) [93322.656383] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.18.23@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [93322.673665] LustreError: Skipped 17 previous similar messages [93544.604933] Lustre: fir-MDT0002: Connection restored to 61a9f06d-cc82-0724-e0b6-3870bcbc10b0 (at 10.9.108.40@o2ib4) [93597.893028] Lustre: fir-MDT0002: haven't heard from client 2e29037a-0922-7d2f-54c8-7dea50803481 (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90fd62d400, cur 1572925970 expire 1572925820 last 1572925743 [93597.914835] Lustre: Skipped 2 previous similar messages [93928.710766] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.112.15@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [93928.728156] LustreError: Skipped 43 previous similar messages [94115.223742] Lustre: fir-MDT0002: Connection restored to 7480f5f4-96e0-4b0e-03d7-5a127d3d492c (at 10.9.110.56@o2ib4) [94115.234182] Lustre: Skipped 1 previous similar message [94270.942798] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.48@o2ib4) [94270.950118] Lustre: Skipped 3 previous similar messages [94362.912596] Lustre: fir-MDT0002: haven't heard from client 5a82ddaf-3375-cfa6-28c6-bbc71a4c3b31 (at 10.9.113.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b6124000, cur 1572926735 expire 1572926585 last 1572926508 [94531.099744] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.112.16@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [94531.117113] LustreError: Skipped 58 previous similar messages [94712.681039] Lustre: fir-MDT0002: Connection restored to 2e29037a-0922-7d2f-54c8-7dea50803481 (at 10.9.112.13@o2ib4) [94712.691468] Lustre: Skipped 1 previous similar message [95142.425224] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.101.48@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [95142.442597] LustreError: Skipped 68 previous similar messages [95344.941581] Lustre: fir-MDT0002: haven't heard from client 98f2e7a4-f7eb-bba0-b5ec-78d7aa66f6bd (at 10.9.108.53@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90f9ab9800, cur 1572927717 expire 1572927567 last 1572927490 [95453.004976] Lustre: fir-MDT0002: Connection restored to (at 10.9.113.2@o2ib4) [95829.962057] Lustre: fir-MDT0002: haven't heard from client 46c68581-0a7a-c26b-fe11-9d23d4c06392 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91ba0d3000, cur 1572928202 expire 1572928052 last 1572927975 [95854.491732] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.26.4@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [95854.508935] LustreError: Skipped 10 previous similar messages [95893.711805] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_statfs to node 10.0.10.51@o2ib7 failed: rc = -107 [95893.722595] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [95908.544240] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [96082.710269] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.51@o2ib7) [96082.717503] Lustre: Skipped 2 previous similar messages [96102.085441] LNet: Service thread pid 67701 was inactive for 200.43s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [96102.102375] Pid: 67701, comm: mdt03_026 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [96102.112549] Call Trace: [96102.115020] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [96102.121630] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [96102.128334] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [96102.135012] [] osp_md_object_lock+0x162/0x2d0 [osp] [96102.141596] [] lod_object_lock+0xf3/0x7b0 [lod] [96102.147812] [] mdd_object_lock+0x3e/0xe0 [mdd] [96102.153955] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [96102.161235] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [96102.167987] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [96102.174211] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [96102.180702] [] mdt_reint_rec+0x83/0x210 [mdt] [96102.186752] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [96102.193330] [] mdt_reint+0x67/0x140 [mdt] [96102.199046] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [96102.206026] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [96102.213742] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [96102.220086] [] kthread+0xd1/0xe0 [96102.225015] [] ret_from_fork_nospec_begin+0xe/0x21 [96102.231508] [] 0xffffffffffffffff [96102.236529] LustreError: dumping log to /tmp/lustre-log.1572928474.67701 [96109.253779] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [96236.638484] LNet: Service thread pid 67701 completed after 334.98s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [96353.201284] Lustre: 67148:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572928718/real 1572928718] req@ffff9a90bebe3600 x1649329647903616/t0(0) o41->fir-MDT0000-osp-MDT0002@10.0.10.51@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1572928725 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [96353.229424] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [96357.573407] Lustre: 67670:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572928722/real 1572928722] req@ffff9a6170adb180 x1649329647910624/t0(0) o101->fir-MDT0000-osp-MDT0002@10.0.10.51@o2ib7:24/4 lens 328/344 e 0 to 1 dl 1572928729 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [96455.071931] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.0.10.102@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [96455.089308] LustreError: Skipped 3085 previous similar messages [96478.671795] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [96485.971350] Lustre: fir-MDT0002: haven't heard from client 1171a9b6-e69f-0d88-ca9a-349489fea18c (at 10.9.117.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8de878b000, cur 1572928858 expire 1572928708 last 1572928631 [96485.993081] Lustre: Skipped 1 previous similar message [96551.121649] LNet: Service thread pid 67670 was inactive for 200.54s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [96551.138586] Pid: 67670, comm: mdt00_024 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [96551.148757] Call Trace: [96551.151220] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [96551.157830] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [96551.164521] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [96551.171187] [] osp_md_object_lock+0x162/0x2d0 [osp] [96551.177760] [] lod_object_lock+0xf3/0x7b0 [lod] [96551.183977] [] mdd_object_lock+0x3e/0xe0 [mdd] [96551.190120] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [96551.197386] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [96551.204135] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [96551.210360] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [96551.216847] [] mdt_reint_rec+0x83/0x210 [mdt] [96551.222898] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [96551.229462] [] mdt_reint+0x67/0x140 [mdt] [96551.235170] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [96551.242116] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [96551.249843] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [96551.256189] [] kthread+0xd1/0xe0 [96551.261121] [] ret_from_fork_nospec_begin+0xe/0x21 [96551.267597] [] 0xffffffffffffffff [96551.272616] LustreError: dumping log to /tmp/lustre-log.1572928923.67670 [96557.265816] LNet: Service thread pid 68105 was inactive for 200.22s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [96557.282753] Pid: 68105, comm: mdt02_093 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [96557.292941] Call Trace: [96557.295406] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [96557.302014] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [96557.308696] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [96557.315368] [] osp_md_object_lock+0x162/0x2d0 [osp] [96557.321967] [] lod_object_lock+0xf3/0x7b0 [lod] [96557.328189] [] mdd_object_lock+0x3e/0xe0 [mdd] [96557.334339] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [96557.341604] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [96557.348360] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [96557.354585] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [96557.361086] [] mdt_reint_rec+0x83/0x210 [mdt] [96557.367137] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [96557.373712] [] mdt_reint+0x67/0x140 [mdt] [96557.379432] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [96557.386392] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [96557.394110] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [96557.400451] [] kthread+0xd1/0xe0 [96557.405365] [] ret_from_fork_nospec_begin+0xe/0x21 [96557.411854] [] 0xffffffffffffffff [96557.416886] LustreError: dumping log to /tmp/lustre-log.1572928929.68105 [96579.026566] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [96660.214431] LNet: Service thread pid 67670 completed after 309.63s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [96660.230610] LNet: Skipped 1 previous similar message [96802.048474] Lustre: 67145:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572929167/real 1572929167] req@ffff9a9130d98d80 x1649329648337568/t0(0) o41->fir-MDT0000-osp-MDT0002@10.0.10.51@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1572929174 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [96802.076597] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [96816.763530] Lustre: fir-MDT0002: Connection restored to 98f2e7a4-f7eb-bba0-b5ec-78d7aa66f6bd (at 10.9.108.53@o2ib4) [96816.773965] Lustre: Skipped 6 previous similar messages [96827.097462] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [97057.200347] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.0.10.102@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [97057.217726] LustreError: Skipped 1027 previous similar messages [97060.535484] Lustre: 67148:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572929425/real 1572929425] req@ffff9a8dabc7f980 x1649329648595456/t0(0) o41->fir-MDT0000-osp-MDT0002@10.0.10.51@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1572929432 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [97060.563608] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [97135.841704] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [97162.090759] Lustre: fir-MDT0002: haven't heard from client 51a802df-6284-3ad2-1e1d-b64176d5ac03 (at 10.9.108.50@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91ba0d2800, cur 1572929534 expire 1572929384 last 1572929307 [97238.017686] Lustre: fir-MDT0002: haven't heard from client 704d0e1f-fd24-cb25-7405-1f922fda9e45 (at 10.8.26.4@o2ib6) in 199 seconds. I think it's dead, and I am evicting it. exp ffff9a70befc7400, cur 1572929610 expire 1572929460 last 1572929411 [98061.784026] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [98061.791167] Lustre: Skipped 7 previous similar messages [98100.014060] Lustre: fir-MDT0002: haven't heard from client 9b6df508-4fd4-62e6-a19b-42f88c25e71f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a796eec3c00, cur 1572930472 expire 1572930322 last 1572930245 [98178.020412] Lustre: fir-MDT0002: haven't heard from client 7c6046b4-0549-c93b-ae3a-91b356af20c1 (at 10.9.108.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8ceab70800, cur 1572930550 expire 1572930400 last 1572930323 [98189.895108] Lustre: fir-MDT0002: Connection restored to (at 10.9.114.6@o2ib4) [98539.067693] Lustre: fir-MDT0002: Connection restored to 51a802df-6284-3ad2-1e1d-b64176d5ac03 (at 10.9.108.50@o2ib4) [98855.037233] Lustre: fir-MDT0002: haven't heard from client 2903778e-b047-7ccd-5d0e-97b037b5bfd4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6fd22a0c00, cur 1572931227 expire 1572931077 last 1572931000 [99492.130849] Lustre: fir-MDT0002: Connection restored to 7c9f2d7a-9454-0f06-e00d-6ea2ec6b96b6 (at 10.9.110.42@o2ib4) [99492.141289] Lustre: Skipped 1 previous similar message [99714.451569] Lustre: fir-MDT0002: Connection restored to 7c6046b4-0549-c93b-ae3a-91b356af20c1 (at 10.9.108.54@o2ib4) [100298.151826] Lustre: fir-MDT0002: Connection restored to (at 10.9.105.30@o2ib4) [100331.072843] Lustre: fir-MDT0002: haven't heard from client 7d106163-8853-c8dd-a3b6-a425739a9740 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6d252cd000, cur 1572932703 expire 1572932553 last 1572932476 [100407.090601] Lustre: fir-MDT0002: haven't heard from client 60ba1642-dfb2-359a-9971-68bb83060e0a (at 10.9.116.6@o2ib4) in 190 seconds. I think it's dead, and I am evicting it. exp ffff9a91abf74000, cur 1572932779 expire 1572932629 last 1572932589 [100454.690260] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [100771.103678] Lustre: fir-MDT0002: haven't heard from client 87078e5a-71d7-36ab-201b-01f0b8f1b849 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a71abf8e400, cur 1572933143 expire 1572932993 last 1572932916 [101135.145915] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [101135.153147] Lustre: Skipped 1 previous similar message [101161.101652] Lustre: fir-MDT0002: haven't heard from client 168b5e58-ec68-d44f-ed7e-ac9cfc8408fd (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a76a33a8c00, cur 1572933533 expire 1572933383 last 1572933306 [102084.177924] Lustre: fir-MDT0002: Connection restored to d61c01fd-7673-605a-fed5-dbd34b64cc55 (at 10.8.23.8@o2ib6) [102530.138281] Lustre: fir-MDT0002: haven't heard from client 417919f9-8710-1e47-cf6d-a6066cbe249c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a62c3243c00, cur 1572934902 expire 1572934752 last 1572934675 [102720.448795] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [102720.456026] Lustre: Skipped 3 previous similar messages [102748.152101] Lustre: fir-MDT0002: haven't heard from client 71d3943c-3489-3470-31cf-b63bb95541b6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5e7da5f000, cur 1572935120 expire 1572934970 last 1572934893 [103909.166180] Lustre: fir-MDT0002: haven't heard from client 3da44ff6-ad4f-9855-11d5-4181b1b61a5f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6091369400, cur 1572936281 expire 1572936131 last 1572936054 [104030.590219] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [106554.827422] Lustre: fir-MDT0002: Connection restored to (at 10.8.30.33@o2ib6) [106554.834739] Lustre: Skipped 1 previous similar message [108147.909182] Lustre: fir-MDT0002: Connection restored to a3b508eb-f449-4b0a-3bd3-d12f6d97173e (at 10.8.22.9@o2ib6) [108392.286543] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.17@o2ib4) [108486.159754] Lustre: fir-MDT0002: Connection restored to a2ab548b-f23c-5628-09b8-db8c32d996b0 (at 10.9.107.21@o2ib4) [108594.289315] Lustre: fir-MDT0002: haven't heard from client 7f9aa22d-36da-1a94-f631-a264f7aa9590 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6d68b12000, cur 1572940966 expire 1572940816 last 1572940739 [108625.570677] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [108928.298053] Lustre: fir-MDT0002: haven't heard from client 61666ad4-a74e-87a0-ba54-1d3f2aad9b41 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6bc9bc2400, cur 1572941300 expire 1572941150 last 1572941073 [109179.905770] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [110161.017083] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [110194.331610] Lustre: fir-MDT0002: haven't heard from client 23e87ab7-b7a9-fab5-66b2-6a5845711369 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a72fa9fb800, cur 1572942566 expire 1572942416 last 1572942339 [110441.198031] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [110514.339320] Lustre: fir-MDT0002: haven't heard from client 5fc51756-3c12-1f30-edcd-d23b80a2b589 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6ae93ae800, cur 1572942886 expire 1572942736 last 1572942659 [111908.790886] Lustre: fir-MDT0002: Connection restored to (at 10.9.107.51@o2ib4) [112105.619179] Lustre: fir-MDT0002: Connection restored to e3815ef7-6838-dce7-b2c1-3bac335ceb16 (at 10.9.107.52@o2ib4) [116158.493277] Lustre: fir-MDT0002: haven't heard from client eec53668-9e11-a36f-991f-f83d08f9400c (at 10.9.108.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a7f86800, cur 1572948530 expire 1572948380 last 1572948303 [117605.211996] Lustre: fir-MDT0002: Connection restored to eec53668-9e11-a36f-991f-f83d08f9400c (at 10.9.108.69@o2ib4) [126240.009759] Lustre: fir-MDT0002: Connection restored to 33237efb-3a1e-77f8-f27f-67527ff6c5d0 (at 10.8.27.14@o2ib6) [128628.748427] Lustre: fir-MDT0002: Connection restored to f7328e8a-dd41-b37f-6216-bf41e0f28b31 (at 10.9.115.13@o2ib4) [134969.847885] Lustre: fir-MDT0002: Connection restored to (at 10.9.115.12@o2ib4) [135190.170157] Lustre: fir-MDT0002: Connection restored to eab9072e-516b-d3ae-8c06-2beb6746b5e0 (at 10.9.116.5@o2ib4) [138758.084300] Lustre: fir-MDT0002: haven't heard from client 51a21c02-9c85-2ad1-5519-18d441d20b35 (at 10.9.110.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7e3c0b5400, cur 1572971129 expire 1572970979 last 1572970902 [139407.200026] Lustre: fir-MDT0002: Connection restored to 1c5a5b33-80e7-daef-ca6c-cf5e9a93e132 (at 10.9.106.4@o2ib4) [139474.104920] Lustre: fir-MDT0002: haven't heard from client b183ba99-6f34-bb51-d879-80707620fdc9 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91abf75000, cur 1572971845 expire 1572971695 last 1572971618 [139474.126710] Lustre: Skipped 38 previous similar messages [139791.365887] Lustre: fir-MDT0002: Connection restored to d1b0feae-da98-8c54-59e0-3db26ca65a40 (at 10.9.114.8@o2ib4) [139813.360919] Lustre: fir-MDT0002: Connection restored to d1542b28-5e7a-e84d-4f8a-41f687fce618 (at 10.9.112.12@o2ib4) [139853.921775] Lustre: fir-MDT0002: Connection restored to c349ec39-0062-da5c-32de-3988568954ac (at 10.9.115.10@o2ib4) [139882.730560] Lustre: fir-MDT0002: Connection restored to f7328e8a-dd41-b37f-6216-bf41e0f28b31 (at 10.9.115.13@o2ib4) [139899.007657] Lustre: fir-MDT0002: Connection restored to 2e29037a-0922-7d2f-54c8-7dea50803481 (at 10.9.112.13@o2ib4) [139899.018177] Lustre: Skipped 3 previous similar messages [140022.843454] Lustre: fir-MDT0002: Connection restored to e5149e92-e3d8-6198-dfe6-33bf64d3e481 (at 10.9.110.20@o2ib4) [140022.853980] Lustre: Skipped 1 previous similar message [140054.944109] Lustre: fir-MDT0002: Connection restored to 6597d5ee-b217-7224-b01c-5eafbd0ac66e (at 10.9.110.55@o2ib4) [140054.954635] Lustre: Skipped 3 previous similar messages [140126.679424] Lustre: fir-MDT0002: Connection restored to 61a9f06d-cc82-0724-e0b6-3870bcbc10b0 (at 10.9.108.40@o2ib4) [140126.689944] Lustre: Skipped 7 previous similar messages [140262.121358] Lustre: fir-MDT0002: haven't heard from client 46c32956-662b-0706-c80a-bc0e57525ada (at 10.9.106.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6032d87800, cur 1572972633 expire 1572972483 last 1572972406 [140408.714361] Lustre: fir-MDT0002: Connection restored to 60ba1642-dfb2-359a-9971-68bb83060e0a (at 10.9.116.6@o2ib4) [140408.724797] Lustre: Skipped 11 previous similar messages [140724.945209] Lustre: fir-MDT0002: Connection restored to ab40eb68-3b81-40b2-4c7f-baae0365294b (at 10.9.110.21@o2ib4) [140724.955738] Lustre: Skipped 6 previous similar messages [141250.495177] Lustre: fir-MDT0002: Connection restored to 4983aa27-7e1b-a0cf-2f90-20677288041d (at 10.9.102.46@o2ib4) [141250.505702] Lustre: Skipped 7 previous similar messages [141944.165679] Lustre: fir-MDT0002: haven't heard from client 8e383076-a475-0bf7-60a7-327bb5d9b5ef (at 10.9.116.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91282d1000, cur 1572974315 expire 1572974165 last 1572974088 [142573.330981] Lustre: fir-MDT0002: Connection restored to fcbbb11e-8d50-bcd9-399d-80ba49fc0a87 (at 10.9.107.4@o2ib4) [142573.341420] Lustre: Skipped 3 previous similar messages [142999.191762] Lustre: fir-MDT0002: haven't heard from client f9d7a98a-9714-1d86-ab2e-6cf84b814c56 (at 10.9.115.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a71b5297000, cur 1572975370 expire 1572975220 last 1572975143 [142999.213549] Lustre: Skipped 1 previous similar message [143085.082186] Lustre: fir-MDT0002: Connection restored to ba15a10b-a95d-133b-f873-39f741c8accb (at 10.9.115.11@o2ib4) [144007.201186] Lustre: fir-MDT0002: Connection restored to 8e383076-a475-0bf7-60a7-327bb5d9b5ef (at 10.9.116.10@o2ib4) [144214.212793] Lustre: fir-MDT0002: Connection restored to f9d7a98a-9714-1d86-ab2e-6cf84b814c56 (at 10.9.115.3@o2ib4) [144818.237708] Lustre: fir-MDT0002: haven't heard from client 4a58c126-a568-2ef3-292b-60f093d08be1 (at 10.9.115.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7ea87c4c00, cur 1572977189 expire 1572977039 last 1572976962 [146053.548200] Lustre: fir-MDT0002: Connection restored to f9d7a98a-9714-1d86-ab2e-6cf84b814c56 (at 10.9.115.3@o2ib4) [146859.997456] Lustre: fir-MDT0002: Connection restored to 8e383076-a475-0bf7-60a7-327bb5d9b5ef (at 10.9.116.10@o2ib4) [147105.297013] Lustre: fir-MDT0002: haven't heard from client d9e8d1d1-af07-ce57-473c-319ce9637cb5 (at 10.9.116.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b68fdc00, cur 1572979476 expire 1572979326 last 1572979249 [147105.318805] Lustre: Skipped 1 previous similar message [147698.311572] Lustre: fir-MDT0002: haven't heard from client 0173e697-ddaa-53e3-60b2-4f9d4eed09b5 (at 10.9.107.50@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90fd62d800, cur 1572980069 expire 1572979919 last 1572979842 [147698.333448] Lustre: Skipped 2 previous similar messages [148211.905548] Lustre: fir-MDT0002: Connection restored to ba15a10b-a95d-133b-f873-39f741c8accb (at 10.9.115.11@o2ib4) [148278.488089] Lustre: fir-MDT0002: Connection restored to d9e8d1d1-af07-ce57-473c-319ce9637cb5 (at 10.9.116.3@o2ib4) [148444.099934] Lustre: fir-MDT0002: Connection restored to fcbbb11e-8d50-bcd9-399d-80ba49fc0a87 (at 10.9.107.4@o2ib4) [149123.165497] Lustre: fir-MDT0002: Connection restored to (at 10.9.107.50@o2ib4) [149191.046248] Lustre: fir-MDT0002: Connection restored to (at 10.9.107.49@o2ib4) [150496.078688] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.1@o2ib4) [150992.397015] Lustre: fir-MDT0002: haven't heard from client d0970a80-0067-6f05-bc50-bc98c606719c (at 10.9.106.60@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bedf4800, cur 1572983363 expire 1572983213 last 1572983136 [151859.803498] Lustre: fir-MDT0002: Connection restored to 116c3749-10d9-8e0c-2705-ce8a50643e3c (at 10.9.114.2@o2ib4) [152432.571795] Lustre: fir-MDT0002: Connection restored to ea567c48-321d-2dc7-281e-feb98b24b0a8 (at 10.9.107.29@o2ib4) [152456.792661] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.60@o2ib4) [154461.484187] Lustre: fir-MDT0002: haven't heard from client 7f4b342c-9749-8c84-c672-1dbe439818af (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b8f2fc00, cur 1572986832 expire 1572986682 last 1572986605 [154611.441609] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.54@o2ib4) [155190.502408] Lustre: fir-MDT0002: haven't heard from client 9d70720d-7016-4c13-d12c-539714dab902 (at 10.9.108.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bedf1000, cur 1572987561 expire 1572987411 last 1572987334 [155326.039071] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.20@o2ib4) [156707.539074] Lustre: fir-MDT0002: haven't heard from client ec422765-5d0c-62dc-308c-7711ebc93482 (at 10.9.108.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7acc9d2800, cur 1572989078 expire 1572988928 last 1572988851 [156744.756459] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.20@o2ib4) [157828.667594] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.26@o2ib4) [158347.549682] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.40@o2ib4) [159345.604272] Lustre: fir-MDT0002: haven't heard from client f105afde-8fa5-537d-8346-d77717e1b922 (at 10.8.31.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61be4c6800, cur 1572991716 expire 1572991566 last 1572991489 [159959.628540] Lustre: fir-MDT0002: haven't heard from client f4f23800-f63d-171b-dcef-50edbe5d431b (at 10.9.106.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a919d656000, cur 1572992330 expire 1572992180 last 1572992103 [160906.961269] Lustre: fir-MDT0002: Connection restored to f105afde-8fa5-537d-8346-d77717e1b922 (at 10.8.31.10@o2ib6) [161291.327388] Lustre: fir-MDT0002: Connection restored to f4f23800-f63d-171b-dcef-50edbe5d431b (at 10.9.106.13@o2ib4) [161497.674051] Lustre: fir-MDT0002: Connection restored to (at 10.8.30.5@o2ib6) [163952.719553] Lustre: fir-MDT0002: haven't heard from client 01c55571-2d56-bead-b405-095b48ee38e2 (at 10.9.107.50@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a712abcc800, cur 1572996323 expire 1572996173 last 1572996096 [165053.045315] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.11@o2ib4) [165070.485442] Lustre: fir-MDT0002: Connection restored to f5b732ff-4959-4283-d29c-fcd8fac11c91 (at 10.9.113.1@o2ib4) [165315.585830] Lustre: fir-MDT0002: Connection restored to e948087b-2fe3-2a51-0694-ae2aa65c0d94 (at 10.9.109.66@o2ib4) [165332.376264] Lustre: fir-MDT0002: Connection restored to 041c1209-eec6-f8ce-c95d-e7e9e84ecf6a (at 10.9.109.68@o2ib4) [165339.029753] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.4@o2ib4) [165339.037071] Lustre: Skipped 1 previous similar message [165366.946298] Lustre: fir-MDT0002: Connection restored to 1dd1485f-e8f9-92c2-846c-51ec24edf6f9 (at 10.9.101.59@o2ib4) [165398.903793] Lustre: fir-MDT0002: Connection restored to (at 10.9.107.49@o2ib4) [165938.476937] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.1@o2ib4) [166841.555229] Lustre: fir-MDT0002: Connection restored to 02487c8a-1e5b-6348-477f-7798269718c0 (at 10.9.101.53@o2ib4) [166852.428016] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.30@o2ib4) [166869.840960] Lustre: fir-MDT0002: Connection restored to (at 10.9.115.1@o2ib4) [167033.214383] LNetError: 321:0:(o2iblnd_cb.c:2961:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: o2iblnd fatal error [167033.224826] LNetError: 321:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167034.690707] Lustre: fir-MDT0002: Client e69c0ae9-d9c9-3930-df16-60df362bd9fc (at 10.8.24.25@o2ib6) reconnecting [167034.700905] Lustre: fir-MDT0002: Connection restored to e69c0ae9-d9c9-3930-df16-60df362bd9fc (at 10.8.24.25@o2ib6) [167034.711348] Lustre: Skipped 1 previous similar message [167035.233272] Lustre: fir-MDT0002: Client 4fb4463b-4df1-b2ca-bcaf-03821e29c498 (at 10.8.8.31@o2ib6) reconnecting [167036.278910] Lustre: fir-MDT0002: Client 89ce1cbf-19a1-5ea3-ec1d-d2cf9a23d0e3 (at 10.8.30.34@o2ib6) reconnecting [167036.289095] Lustre: Skipped 2 previous similar messages [167038.304965] Lustre: fir-MDT0002: Client d0f8b9e4-69e3-eae4-954d-2e404bd55325 (at 10.8.26.14@o2ib6) reconnecting [167038.315144] Lustre: Skipped 1 previous similar message [167038.758246] LustreError: 67993:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a71b6548050 x1649068433328064/t0(0) o4->67360d0f-602d-e0fd-a763-b6dc0eec238b@10.8.27.35@o2ib6:703/0 lens 488/448 e 0 to 0 dl 1572999413 ref 1 fl Interpret:/0/0 rc 0/0 [167038.780265] Lustre: fir-MDT0002: Bulk IO write error with 67360d0f-602d-e0fd-a763-b6dc0eec238b (at 10.8.27.35@o2ib6), client will retry: rc = -110 [167038.795603] LustreError: 67993:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 1 previous similar message [167040.880296] Lustre: 67839:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572999404/real 1572999404] req@ffff9a7ac8e66300 x1649329889803360/t0(0) o104->fir-MDT0002@10.8.0.68@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1572999411 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [167042.581553] Lustre: fir-MDT0002: Client b44d8559-b6fa-c6ac-9733-9495841decff (at 10.8.17.25@o2ib6) reconnecting [167042.591726] Lustre: Skipped 13 previous similar messages [167049.421502] LustreError: 68019:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a5581b8d850 x1648868991085056/t0(0) o4->8dd84d2f-366f-ac1d-06b8-51ead500d18d@10.8.20.28@o2ib6:729/0 lens 520/456 e 1 to 0 dl 1572999439 ref 1 fl Interpret:/0/0 rc 0/0 [167049.445648] Lustre: fir-MDT0002: Bulk IO write error with 8dd84d2f-366f-ac1d-06b8-51ead500d18d (at 10.8.20.28@o2ib6), client will retry: rc = -110 [167049.458881] Lustre: Skipped 1 previous similar message [167049.584519] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.201@o2ib7 added to recovery queue. Health = 900 [167049.584526] LustreError: 67992:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(15971) req@ffff9a5581b88850 x1648390895620192/t0(0) o4->48f67746-6174-d4eb-bf6b-7295eeca30af@10.8.24.7@o2ib6:729/0 lens 520/456 e 1 to 0 dl 1572999439 ref 1 fl Interpret:/0/0 rc 0/0 [167049.628511] Lustre: 67839:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1572999412/real 1572999412] req@ffff9a74b91b1f80 x1649329889827376/t0(0) o104->fir-MDT0002@10.8.28.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1572999419 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [167051.269143] Lustre: fir-MDT0002: Client 630738cc-8718-c621-48f8-a25afd191c66 (at 10.8.25.15@o2ib6) reconnecting [167051.279336] Lustre: Skipped 14 previous similar messages [167058.460727] LustreError: 68016:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a5dc759d850 x1648863586163200/t0(0) o4->04df2f2a-8ca2-90e6-6d1b-9d261ac20550@10.8.20.6@o2ib6:738/0 lens 520/456 e 1 to 0 dl 1572999448 ref 1 fl Interpret:/0/0 rc 0/0 [167058.484806] Lustre: fir-MDT0002: Bulk IO write error with 04df2f2a-8ca2-90e6-6d1b-9d261ac20550 (at 10.8.20.6@o2ib6), client will retry: rc = -110 [167058.497930] Lustre: Skipped 1 previous similar message [167061.823811] LustreError: 68150:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a7ed01a1050 x1649068433629936/t0(0) o4->67360d0f-602d-e0fd-a763-b6dc0eec238b@10.8.27.35@o2ib6:1/0 lens 488/448 e 0 to 0 dl 1572999466 ref 1 fl Interpret:/0/0 rc 0/0 [167061.847809] Lustre: fir-MDT0002: Bulk IO write error with 67360d0f-602d-e0fd-a763-b6dc0eec238b (at 10.8.27.35@o2ib6), client will retry: rc = -110 [167068.947645] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.25.29@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [167068.965032] LustreError: Skipped 1203 previous similar messages [167137.152329] Lustre: fir-MDT0002: Client c7368b01-3213-c7f8-e4d5-697363e9817e (at 10.8.25.21@o2ib6) reconnecting [167137.162507] Lustre: Skipped 26 previous similar messages [167137.167936] Lustre: fir-MDT0002: Connection restored to c7368b01-3213-c7f8-e4d5-697363e9817e (at 10.8.25.21@o2ib6) [167137.178408] Lustre: Skipped 61 previous similar messages [167321.641278] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 0 seconds [167321.651555] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167328.641428] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 1 seconds [167328.651702] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167329.664611] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.204@o2ib7 added to recovery queue. Health = 900 [167334.641599] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 0 seconds [167334.651872] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167334.677570] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.204@o2ib7: -125 [167345.641869] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 0 seconds [167345.652128] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [167345.661443] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167355.642108] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 0 seconds [167355.652379] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167365.642348] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 0 seconds [167381.642762] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 1 seconds [167381.653044] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167381.665055] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 1 previous similar message [167415.643602] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 0 seconds [167415.653862] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 2 previous similar messages [167415.663261] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167415.675269] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 2 previous similar messages [167490.645456] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 0 seconds [167490.655716] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 5 previous similar messages [167490.665136] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167490.677139] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 5 previous similar messages [167519.258903] Lustre: fir-MDT0002: Connection restored to b503fa75-0c14-82b8-2753-24561210f8c9 (at 10.8.0.66@o2ib6) [167519.269259] Lustre: Skipped 4 previous similar messages [167622.648715] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 0 seconds [167622.658981] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 10 previous similar messages [167622.668500] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [167622.680497] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 10 previous similar messages [167634.711992] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.204@o2ib7: -125 [167725.812409] Lustre: fir-MDT0002: haven't heard from client b3a05eed-2a4b-daf7-96e8-65768daebb42 (at 10.8.7.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a71afe65c00, cur 1573000096 expire 1572999946 last 1572999869 [167725.834136] Lustre: Skipped 4 previous similar messages [167732.219935] Lustre: fir-MDT0002: Client 67360d0f-602d-e0fd-a763-b6dc0eec238b (at 10.8.27.35@o2ib6) reconnecting [167732.230112] Lustre: Skipped 4 previous similar messages [167732.278398] LustreError: 67993:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a6645a75050 x1649068446083728/t0(0) o4->67360d0f-602d-e0fd-a763-b6dc0eec238b@10.8.27.35@o2ib6:666/0 lens 488/448 e 1 to 0 dl 1573000131 ref 1 fl Interpret:/0/0 rc 0/0 [167732.278428] Lustre: fir-MDT0002: Bulk IO write error with 67360d0f-602d-e0fd-a763-b6dc0eec238b (at 10.8.27.35@o2ib6), client will retry: rc = -110 [167732.278430] Lustre: Skipped 1 previous similar message [167732.320957] LustreError: 67993:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 4 previous similar messages [167732.572413] Lustre: 67608:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573000095/real 1573000095] req@ffff9a6fe8343a80 x1649329896413424/t0(0) o104->fir-MDT0002@10.8.8.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573000102 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [167733.296430] LustreError: 68022:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a71bd0a9850 x1648768509134096/t0(0) o4->4fb4463b-4df1-b2ca-bcaf-03821e29c498@10.8.8.31@o2ib6:666/0 lens 488/448 e 1 to 0 dl 1573000131 ref 1 fl Interpret:/0/0 rc 0/0 [167733.298442] Lustre: fir-MDT0002: Bulk IO write error with 4fb4463b-4df1-b2ca-bcaf-03821e29c498 (at 10.8.8.31@o2ib6), client will retry: rc = -110 [167733.298445] Lustre: Skipped 4 previous similar messages [167733.338906] LustreError: 68022:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 4 previous similar messages [167738.366562] LustreError: 68016:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a5de33ac850 x1649289034286512/t0(0) o4->79b3ac8f-5d21-fd41-5df4-dc314ce4868b@10.8.25.8@o2ib6:666/0 lens 504/448 e 1 to 0 dl 1573000131 ref 1 fl Interpret:/0/0 rc 0/0 [167738.390628] Lustre: fir-MDT0002: Bulk IO write error with 79b3ac8f-5d21-fd41-5df4-dc314ce4868b (at 10.8.25.8@o2ib6), client will retry: rc = -110 [167738.403759] Lustre: Skipped 3 previous similar messages [167740.624612] LustreError: 67993:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a71bd0a8050 x1648768509134096/t0(0) o4->4fb4463b-4df1-b2ca-bcaf-03821e29c498@10.8.8.31@o2ib6:668/0 lens 488/448 e 0 to 0 dl 1573000133 ref 1 fl Interpret:/2/0 rc 0/0 [167740.624634] Lustre: fir-MDT0002: Bulk IO write error with 4fb4463b-4df1-b2ca-bcaf-03821e29c498 (at 10.8.8.31@o2ib6), client will retry: rc = -110 [167740.624636] Lustre: Skipped 7 previous similar messages [167740.667108] LustreError: 67993:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 8 previous similar messages [167744.738852] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.201@o2ib7 added to recovery queue. Health = 900 [167744.738864] LustreError: 67256:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(14810) req@ffff9a5dc759d850 x1649288990160288/t0(0) o4->e05cc48d-6b25-1271-2d5a-f23d00ab4bcf@10.8.24.2@o2ib6:682/0 lens 504/448 e 0 to 0 dl 1573000147 ref 1 fl Interpret:/0/0 rc 0/0 [167744.738882] Lustre: fir-MDT0002: Bulk IO write error with e05cc48d-6b25-1271-2d5a-f23d00ab4bcf (at 10.8.24.2@o2ib6), client will retry: rc = -110 [167744.738884] Lustre: Skipped 1 previous similar message [167746.174746] Lustre: 68180:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573000109/real 1573000109] req@ffff9a6b1563cc80 x1649329896658784/t0(0) o105->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 304/224 e 0 to 1 dl 1573000116 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [167746.177745] LustreError: 68016:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a603c889050 x1649068446092752/t0(0) o4->67360d0f-602d-e0fd-a763-b6dc0eec238b@10.8.27.35@o2ib6:687/0 lens 488/448 e 0 to 0 dl 1573000152 ref 1 fl Interpret:/0/0 rc 0/0 [167749.796169] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.202@o2ib7 added to recovery queue. Health = 900 [167749.796179] LustreError: 67980:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(8192) req@ffff9a702f341050 x1649289034286512/t0(0) o4->79b3ac8f-5d21-fd41-5df4-dc314ce4868b@10.8.25.8@o2ib6:673/0 lens 504/448 e 0 to 0 dl 1573000138 ref 1 fl Interpret:/2/0 rc 0/0 [167754.698957] LustreError: 67964:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a71bd0a8050 x1648768509134064/t0(0) o4->4fb4463b-4df1-b2ca-bcaf-03821e29c498@10.8.8.31@o2ib6:682/0 lens 488/448 e 0 to 0 dl 1573000147 ref 1 fl Interpret:/2/0 rc 0/0 [167754.723005] LustreError: 67964:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 8 previous similar messages [167754.732506] Lustre: fir-MDT0002: Bulk IO write error with 4fb4463b-4df1-b2ca-bcaf-03821e29c498 (at 10.8.8.31@o2ib6), client will retry: rc = -110 [167754.745647] Lustre: Skipped 9 previous similar messages [167757.216760] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.24.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [167757.234049] LustreError: Skipped 11 previous similar messages [167766.599925] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.28.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [167766.617214] LustreError: Skipped 34 previous similar messages [167768.587304] Lustre: 68111:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573000131/real 1573000131] req@ffff9a73582d8900 x1649329896910560/t0(0) o106->fir-MDT0002@10.8.0.82@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573000138 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [167776.486207] Lustre: fir-MDT0002: Connection restored to c20915b7-72a8-8f0f-a961-7c81095a2283 (at 10.8.23.29@o2ib6) [167776.496641] Lustre: Skipped 211 previous similar messages [167786.685937] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [167786.703306] LustreError: Skipped 38 previous similar messages [167796.352504] Lustre: fir-MDT0002: Client 83541b6a-8b7b-dcec-f7fc-b8cc4b0f1367 (at 10.8.22.31@o2ib6) reconnecting [167796.362678] Lustre: Skipped 232 previous similar messages [167825.617731] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.22.27@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [167825.635104] LustreError: Skipped 136 previous similar messages [167838.348025] LustreError: 67964:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a702f343850 x1649068446092752/t0(0) o4->67360d0f-602d-e0fd-a763-b6dc0eec238b@10.8.27.35@o2ib6:24/0 lens 488/448 e 0 to 0 dl 1573000244 ref 1 fl Interpret:/2/0 rc 0/0 [167838.350074] Lustre: fir-MDT0002: Bulk IO write error with 67360d0f-602d-e0fd-a763-b6dc0eec238b (at 10.8.27.35@o2ib6), client will retry: rc = -110 [167838.350076] Lustre: Skipped 2 previous similar messages [167838.390672] LustreError: 67964:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 3 previous similar messages [167865.545705] Lustre: 67632:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573000228/real 1573000228] req@ffff9a81a913ec00 x1649329898068000/t0(0) o106->fir-MDT0002@10.9.107.60@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1573000235 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [167865.573133] Lustre: 67632:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [167874.895939] LustreError: 68060:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a71b654a850 x1649381411730576/t0(0) o4->c6b36f94-67b7-7ffb-733c-d0e83ca0d57f@10.9.106.17@o2ib4:30/0 lens 488/448 e 0 to 0 dl 1573000250 ref 1 fl Interpret:/0/0 rc 0/0 [167874.897956] Lustre: fir-MDT0002: Bulk IO write error with c6b36f94-67b7-7ffb-733c-d0e83ca0d57f (at 10.9.106.17@o2ib4), client will retry: rc = -110 [167874.897958] Lustre: Skipped 1 previous similar message [167874.938605] LustreError: 68060:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 2 previous similar messages [167878.845503] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.212@o2ib7 added to recovery queue. Health = 900 [167878.845519] LustreError: 67925:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff9a8deea99050 x1649316080553376/t0(0) o4->34004248-b9f7-fa76-67ab-9379f67ee678@10.9.117.45@o2ib4:58/0 lens 488/448 e 0 to 0 dl 1573000278 ref 1 fl Interpret:/0/0 rc 0/0 [167888.883974] LustreError: 68060:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(8192) req@ffff9a6e7b317050 x1648687723228496/t0(0) o4->49b7ea94-4577-eca3-1515-b1c520941f2a@10.9.104.43@o2ib4:58/0 lens 504/448 e 0 to 0 dl 1573000278 ref 1 fl Interpret:/2/0 rc 0/0 [167900.696226] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.103.62@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [167900.713684] LustreError: Skipped 207 previous similar messages [167913.885301] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.210@o2ib7 added to recovery queue. Health = 900 [167913.885311] LustreError: 68158:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(8192) req@ffff9a81aaacc050 x1648687723228496/t0(0) o4->49b7ea94-4577-eca3-1515-b1c520941f2a@10.9.104.43@o2ib4:83/0 lens 504/448 e 0 to 0 dl 1573000303 ref 1 fl Interpret:/2/0 rc 0/0 [167924.383232] Lustre: fir-MDT0002: Client 7c7c6c55-ee0a-39dc-ced8-86854b97f795 (at 10.9.116.4@o2ib4) reconnecting [167924.393415] Lustre: Skipped 672 previous similar messages [167926.817305] Lustre: fir-MDT0002: haven't heard from client ea43cad7-8e30-4e17-f067-dc042f6e8696 (at 10.8.30.24@o2ib6) in 194 seconds. I think it's dead, and I am evicting it. exp ffff9a70c9b31400, cur 1573000297 expire 1573000147 last 1573000103 [167933.927703] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.210@o2ib7 added to recovery queue. Health = 900 [167933.927714] LustreError: 68130:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(4096) req@ffff9a737c94cc80 x1649325851086352/t0(0) o37->a8270da0-0393-37e4-ff36-7fec1cb9e404@10.9.101.57@o2ib4:100/0 lens 448/440 e 1 to 0 dl 1573000320 ref 1 fl Interpret:/0/0 rc 0/0 [168062.819693] Lustre: fir-MDT0002: haven't heard from client 29e66763-b95c-3d3e-5532-53facc0d6b7a (at 10.9.109.32@o2ib4) in 155 seconds. I think it's dead, and I am evicting it. exp ffff9a91b68fd000, cur 1573000433 expire 1573000283 last 1573000278 [168419.541385] Lustre: fir-MDT0002: Client eae7c8f6-dc9e-7749-c682-cf3cd37373ad (at 10.9.103.46@o2ib4) reconnecting [168419.551652] Lustre: Skipped 572 previous similar messages [168419.557164] Lustre: fir-MDT0002: Connection restored to eae7c8f6-dc9e-7749-c682-cf3cd37373ad (at 10.9.103.46@o2ib4) [168419.567698] Lustre: Skipped 1270 previous similar messages [169204.688057] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [169204.698316] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.212@o2ib7 (21): c: 6, oc: 0, rc: 8 [169214.067647] LNetError: 87291:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [169214.079665] LNetError: 87291:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 8 previous similar messages [169249.070653] LNetError: 87291:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [169249.082654] LNetError: 87291:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 2 previous similar messages [169257.689389] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [169257.699696] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.203@o2ib7 (106): c: 7, oc: 0, rc: 8 [169257.712253] LNetError: 67094:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.203@o2ib7 added to recovery queue. Health = 900 [169324.072443] LNetError: 86980:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [169324.084435] LNetError: 86980:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 9 previous similar messages [169344.691569] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [169344.701828] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 8 previous similar messages [169379.692442] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [169379.702700] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 5 previous similar messages [169387.074627] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [169449.694229] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [169449.704485] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 12 previous similar messages [169454.694353] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [169454.706355] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 14 previous similar messages [169496.858151] Lustre: fir-MDT0002: haven't heard from client f5b732ff-4959-4283-d29c-fcd8fac11c91 (at 10.9.113.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a700a3be400, cur 1573001867 expire 1573001717 last 1573001640 [169566.098184] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [169589.698765] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [169589.709023] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 23 previous similar messages [169693.114426] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [169719.701144] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [169719.713141] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 31 previous similar messages [169858.704677] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [169858.714932] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 42 previous similar messages [169866.138869] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [169961.026163] Lustre: fir-MDT0002: Connection restored to 8e78bac9-a029-aa75-eab0-eebed35df7a4 (at 10.9.106.27@o2ib4) [169994.159142] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [170174.178719] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [170234.714287] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [170234.726280] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 55 previous similar messages [170296.197861] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [170374.717880] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [170374.728138] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 94 previous similar messages [170475.220459] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [170590.241407] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [170638.605071] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.43@o2ib4) [170775.260258] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [170839.729902] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [170839.741928] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 65 previous similar messages [170974.733377] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [170974.743633] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [171076.286103] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [171076.298193] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 1 previous similar message [171440.745683] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [171440.757678] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 59 previous similar messages [171575.749231] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [171575.759492] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 107 previous similar messages [171672.344826] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [171672.356906] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [172045.297815] LNetError: 87291:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [172045.309817] LNetError: 87291:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 69 previous similar messages [172179.765454] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [172179.775715] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 111 previous similar messages [172279.417137] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [172279.429221] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [172655.361250] LNetError: 88216:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [172655.373246] LNetError: 88216:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 70 previous similar messages [172779.781625] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [172779.791879] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 107 previous similar messages [173004.478707] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [173004.490785] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [173169.952986] Lustre: fir-MDT0002: haven't heard from client a3c2090f-0eca-cd39-65df-d4c926cfe4e9 (at 10.8.27.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bed34000, cur 1573005540 expire 1573005390 last 1573005313 [173259.794575] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [173259.806571] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 71 previous similar messages [173335.736798] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.24@o2ib4) [173379.797816] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [173379.808081] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [173785.543920] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [173785.556009] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [173859.810947] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [173859.822945] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 67 previous similar messages [173878.993512] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.16@o2ib4) [173984.814411] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [173984.824671] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 105 previous similar messages [174464.827549] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [174464.839586] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 64 previous similar messages [174509.606745] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [174509.618824] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [174595.830959] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [174595.841220] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 102 previous similar messages [174729.970497] Lustre: fir-MDT0002: Connection restored to a3c2090f-0eca-cd39-65df-d4c926cfe4e9 (at 10.8.27.24@o2ib6) [175069.843369] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [175069.855366] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 66 previous similar messages [175111.659449] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [175111.671539] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [175199.846752] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [175199.857010] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 102 previous similar messages [175679.859309] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [175679.871301] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 84 previous similar messages [175799.862446] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [175799.872700] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 105 previous similar messages [175885.728697] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [175885.740782] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [176279.874921] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [176279.886921] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 71 previous similar messages [176409.878297] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [176409.888558] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 107 previous similar messages [176494.793501] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [176494.805583] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [176884.890609] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [176884.902600] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 66 previous similar messages [177014.893980] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [177014.904235] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [177095.843081] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [177095.855165] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [177484.906194] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [177484.918198] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 66 previous similar messages [177614.909558] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [177614.919812] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 104 previous similar messages [177797.074697] Lustre: fir-MDT0002: haven't heard from client 8088fb82-69d6-5f55-6a4c-3369f0c19cb6 (at 10.9.113.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6dd87a9400, cur 1573010167 expire 1573010017 last 1573009940 [177820.892899] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [177820.904987] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [178090.921929] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [178090.933945] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 65 previous similar messages [178220.925312] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [178220.935566] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 107 previous similar messages [178600.979191] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [178600.991268] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [178699.184745] LNetError: 89921:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [178699.196738] LNetError: 89921:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 72 previous similar messages [178820.940909] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [178820.951165] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 101 previous similar messages [179301.012296] LNetError: 89921:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [179301.024296] LNetError: 89921:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 62 previous similar messages [179326.039950] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [179326.052030] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [179420.956403] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [179420.966665] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 97 previous similar messages [179910.969117] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [179910.981118] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 74 previous similar messages [180025.972081] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [180025.982338] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 105 previous similar messages [180106.115129] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [180106.127211] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [180514.984715] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [180514.996711] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 68 previous similar messages [180639.987943] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [180639.998204] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 105 previous similar messages [180709.187738] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [180709.199825] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [181119.000447] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [181119.012450] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 72 previous similar messages [181240.003620] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [181240.013876] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 108 previous similar messages [181311.259531] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [181311.271616] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [181720.016349] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [181720.028349] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 70 previous similar messages [181840.019540] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [181840.029804] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 108 previous similar messages [182034.335696] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [182034.347777] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [182320.033336] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [182320.045328] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 72 previous similar messages [182440.035538] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [182440.045799] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 106 previous similar messages [182816.436496] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [182816.448617] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [182921.049280] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [182921.061279] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 70 previous similar messages [183045.052607] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [183045.062863] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [183525.064394] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [183525.076395] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 65 previous similar messages [183539.501770] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [183539.513853] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [183655.067847] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [183655.078109] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [183876.528616] Lustre: 67135:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016238/real 1573016238] req@ffff9a81ba370000 x1649330049530432/t0(0) o103->fir-MDT0000-osp-MDT0002@10.0.10.51@o2ib7:17/18 lens 328/224 e 0 to 1 dl 1573016245 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [183876.557045] Lustre: 67135:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [183876.566827] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [183879.096690] Lustre: 67639:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016241/real 1573016241] req@ffff9a6e13c81f80 x1649330049578896/t0(0) o101->fir-MDT0000-osp-MDT0002@10.0.10.51@o2ib7:24/4 lens 328/344 e 0 to 1 dl 1573016248 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [183881.504740] Lustre: 67154:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016243/real 1573016243] req@ffff9a91a4b1da00 x1649330049594112/t0(0) o41->fir-MDT0000-osp-MDT0002@10.0.10.51@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1573016250 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [183908.361440] Lustre: 67135:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016271/real 1573016271] req@ffff9a7869584800 x1649330049952608/t0(0) o400->fir-MDT0000-lwp-MDT0002@10.0.10.51@o2ib7:12/10 lens 224/224 e 0 to 1 dl 1573016278 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [183908.389913] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [183908.865462] LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail [183926.039425] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.114.2@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [183926.056849] LustreError: Skipped 652 previous similar messages [183928.137186] Lustre: fir-MDT0002: Received new LWP connection from 10.0.10.51@o2ib7, removing former export from same NID [183928.148158] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.51@o2ib7) [183944.790426] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.107.44@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [183944.807884] LustreError: Skipped 1015 previous similar messages [183958.955124] Lustre: MGC10.0.10.51@o2ib7: Connection restored to MGC10.0.10.51@o2ib7_0 (at 10.0.10.51@o2ib7) [183983.702574] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.28.8@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [183983.719889] LustreError: Skipped 327 previous similar messages [183984.043575] Lustre: fir-MDT0000-osp-MDT0002: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) [184009.132259] Lustre: fir-MDT0000-lwp-MDT0002: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) [184129.080166] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [184129.092159] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 70 previous similar messages [184241.245618] Lustre: fir-MDT0002: haven't heard from client 4737d7cc-3e1f-a8cc-964f-c8d597fce061 (at 10.8.27.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b5c01800, cur 1573016611 expire 1573016461 last 1573016384 [184255.083414] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [184255.093672] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [184276.345960] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016639/real 1573016639] req@ffff9a619faf7080 x1649330056595504/t0(0) o106->fir-MDT0002@10.9.115.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1573016646 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [184276.373351] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [184290.383313] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016653/real 1573016653] req@ffff9a619faf7080 x1649330056595504/t0(0) o106->fir-MDT0002@10.9.115.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1573016660 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [184290.410737] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [184311.420845] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016674/real 1573016674] req@ffff9a619faf7080 x1649330056595504/t0(0) o106->fir-MDT0002@10.9.115.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1573016681 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [184311.448256] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [184321.589109] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [184321.601214] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [184346.459752] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016709/real 1573016709] req@ffff9a619faf7080 x1649330056595504/t0(0) o106->fir-MDT0002@10.9.115.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1573016716 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [184346.487104] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [184416.499555] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016779/real 1573016779] req@ffff9a619faf7080 x1649330056595504/t0(0) o106->fir-MDT0002@10.9.115.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1573016786 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [184416.526894] Lustre: 67842:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [184439.250510] Lustre: fir-MDT0002: haven't heard from client e4afa95b-e7ac-30df-de3d-de81555307ba (at 10.9.115.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a87585ef400, cur 1573016809 expire 1573016659 last 1573016582 [184439.272331] LustreError: 67842:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.115.5@o2ib4) failed to reply to glimpse AST (req@ffff9a619faf7080 x1649330056595504 status 0 rc -5), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6f152f4800/0x746adaf73c1cc2c9 lrc: 4/0,0 mode: PW/PW res: [0x2c0032ec6:0x10d12:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.9.115.5@o2ib4 remote: 0xd9d3910fd37af1c9 expref: 316296 pid: 67834 timeout: 0 lvb_type: 0 [184439.314887] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.115.5@o2ib4 was evicted due to a lock glimpse callback time out: rc -5 [184439.327277] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 2689s: evicting client at 10.9.115.5@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a6f152f4800/0x746adaf73c1cc2c9 lrc: 3/0,0 mode: PW/PW res: [0x2c0032ec6:0x10d12:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.9.115.5@o2ib4 remote: 0xd9d3910fd37af1c9 expref: 311022 pid: 67834 timeout: 0 lvb_type: 0 [184730.095680] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [184730.107720] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 64 previous similar messages [184855.098928] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [184855.109187] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 100 previous similar messages [185043.659736] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [185043.671833] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [185332.112156] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [185332.124155] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 76 previous similar messages [185461.114466] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [185461.124724] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 104 previous similar messages [185553.237966] Lustre: fir-MDT0002: Connection restored to e4afa95b-e7ac-30df-de3d-de81555307ba (at 10.9.115.5@o2ib4) [185824.769751] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [185824.781854] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [185934.126564] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [185934.138563] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 71 previous similar messages [186031.383357] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.18@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [186031.400815] LustreError: Skipped 10 previous similar messages [186065.129908] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [186065.140170] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 111 previous similar messages [186131.738672] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.18@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [186232.095644] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.18@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [186332.447233] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.18@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [186457.890777] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.18@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [186536.141920] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [186536.153919] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 71 previous similar messages [186549.995261] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [186550.007348] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [186670.145323] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [186670.155579] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 118 previous similar messages [186671.305375] Lustre: fir-MDT0002: haven't heard from client 49a7bb91-1e44-9061-7b3f-d5e25fd318ce (at 10.9.106.28@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61a9a2fc00, cur 1573019041 expire 1573018891 last 1573018814 [186708.781251] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.18@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [186708.798731] LustreError: Skipped 1 previous similar message [187110.202721] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.18@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [187110.220202] LustreError: Skipped 2 previous similar messages [187139.157237] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [187139.169232] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 73 previous similar messages [187271.160592] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [187271.170850] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 111 previous similar messages [187330.171099] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [187330.183183] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [187527.328193] Lustre: fir-MDT0002: haven't heard from client fc85a6cc-3249-1d3e-9a39-9bb09055d536 (at 10.9.105.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b9672000, cur 1573019897 expire 1573019747 last 1573019670 [187579.432792] Lustre: fir-MDT0002: Connection restored to fc85a6cc-3249-1d3e-9a39-9bb09055d536 (at 10.9.105.33@o2ib4) [187712.326082] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.18@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [187712.343547] LustreError: Skipped 4 previous similar messages [187741.172709] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [187741.184703] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 81 previous similar messages [187875.176183] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [187875.186440] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 117 previous similar messages [188055.403806] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [188055.415895] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [188145.508637] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.28@o2ib4) [188267.475233] Lustre: 67775:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573020630/real 1573020630] req@ffff9a55567a0900 x1649330132713872/t0(0) o106->fir-MDT0002@10.9.117.39@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1573020637 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [188267.502668] Lustre: 67775:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [188268.345720] Lustre: fir-MDT0002: Connection restored to (at 10.9.117.6@o2ib4) [188273.101765] Lustre: fir-MDT0002: Connection restored to (at 10.9.117.17@o2ib4) [188273.109164] Lustre: Skipped 1 previous similar message [188282.938175] Lustre: fir-MDT0002: Connection restored to (at 10.9.117.15@o2ib4) [188288.477745] Lustre: 67747:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573020651/real 1573020651] req@ffff9a5a1f5a2400 x1649330132713904/t0(0) o106->fir-MDT0002@10.9.117.39@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1573020658 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [188288.505172] Lustre: 67747:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [188295.349750] Lustre: fir-MDT0002: haven't heard from client 12407c96-86ba-70af-c33d-457a1fb9e45e (at 10.9.117.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90a3675c00, cur 1573020665 expire 1573020515 last 1573020438 [188295.372375] LustreError: 67747:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.117.39@o2ib4) failed to reply to glimpse AST (req@ffff9a5a1f5a2400 x1649330132713904 status 0 rc -5), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6fb47ac5c0/0x746adaf6c59146ef lrc: 4/0,0 mode: PW/PW res: [0x2c0032e21:0x942:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.9.117.39@o2ib4 remote: 0x6ac7aeec8dbfa7cc expref: 153 pid: 67743 timeout: 0 lvb_type: 0 [188295.372380] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.117.39@o2ib4 was evicted due to a lock glimpse callback time out: rc -5 [188295.372402] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 15026s: evicting client at 10.9.117.39@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a68a8fd6300/0x746adaf71c7d88de lrc: 3/0,0 mode: PW/PW res: [0x2c0032e21:0x941:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.9.117.39@o2ib4 remote: 0x6ac7aeec8dc49673 expref: 150 pid: 67519 timeout: 0 lvb_type: 0 [188295.464860] LustreError: 67747:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message [188298.328925] Lustre: fir-MDT0002: Connection restored to ec5357f4-3a41-e113-e586-2392fb551089 (at 10.9.117.23@o2ib4) [188298.339455] Lustre: Skipped 4 previous similar messages [188316.247961] Lustre: fir-MDT0002: Connection restored to eedd3a24-54dd-1112-d728-4293c06f59f0 (at 10.9.117.27@o2ib4) [188316.258494] Lustre: Skipped 5 previous similar messages [188318.460066] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.6@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [188318.477457] LustreError: Skipped 13 previous similar messages [188343.188176] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [188343.200188] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 74 previous similar messages [188354.471854] Lustre: fir-MDT0002: Connection restored to f1899e2a-cbcd-0ddb-3a11-802565232454 (at 10.9.117.13@o2ib4) [188354.482382] Lustre: Skipped 15 previous similar messages [188371.348964] Lustre: fir-MDT0002: haven't heard from client f1899e2a-cbcd-0ddb-3a11-802565232454 (at 10.9.117.13@o2ib4) in 194 seconds. I think it's dead, and I am evicting it. exp ffff9a8de87eec00, cur 1573020741 expire 1573020591 last 1573020547 [188371.370839] Lustre: Skipped 30 previous similar messages [188475.191655] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [188475.201925] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 121 previous similar messages [188835.671840] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [188835.683928] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [188924.547316] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.117.14@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [188924.564783] LustreError: Skipped 185 previous similar messages [188945.203651] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [188945.215646] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 76 previous similar messages [189045.510219] Lustre: fir-MDT0002: Client 51315fc7-c4b3-f078-d969-3ad7a610223a (at 10.8.8.32@o2ib6) reconnecting [189045.520321] Lustre: fir-MDT0002: Connection restored to 51315fc7-c4b3-f078-d969-3ad7a610223a (at 10.8.8.32@o2ib6) [189045.530666] Lustre: Skipped 1 previous similar message [189045.828201] Lustre: 67834:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573021408/real 1573021408] req@ffff9a71b1f35580 x1649330145584960/t0(0) o104->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573021415 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [189045.855547] Lustre: 67834:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [189046.049208] LustreError: 68063:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a69c507f850 x1648297167946080/t0(0) o4->8f2648b4-4022-d79e-18a5-f850119b4e30@10.8.17.16@o2ib6:82/0 lens 488/448 e 1 to 0 dl 1573021442 ref 1 fl Interpret:/0/0 rc 0/0 [189046.049235] Lustre: fir-MDT0002: Bulk IO write error with 8f2648b4-4022-d79e-18a5-f850119b4e30 (at 10.8.17.16@o2ib6), client will retry: rc = -110 [189046.049238] Lustre: Skipped 16 previous similar messages [189046.091872] LustreError: 68063:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 15 previous similar messages [189047.373903] Lustre: fir-MDT0002: haven't heard from client c7da60e2-1da2-1aa0-71e3-83d7cefee0c5 (at 10.9.112.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91abf77c00, cur 1573021417 expire 1573021267 last 1573021190 [189050.598329] Lustre: 67773:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573021413/real 1573021413] req@ffff9a891de0e780 x1649330145587872/t0(0) o104->fir-MDT0002@10.9.102.33@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573021420 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [189050.625748] Lustre: 67773:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [189053.593893] Lustre: fir-MDT0002: Connection restored to 9e6019b2-e72a-be9a-07e3-b4bb84e4d17c (at 10.8.30.29@o2ib6) [189053.604330] Lustre: Skipped 85 previous similar messages [189054.710434] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.201@o2ib7 added to recovery queue. Health = 900 [189054.710443] LustreError: 67981:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a5581b8e050 x1648436242109376/t0(0) o4->89ce1cbf-19a1-5ea3-ec1d-d2cf9a23d0e3@10.8.30.34@o2ib6:80/0 lens 504/448 e 1 to 0 dl 1573021440 ref 1 fl Interpret:/0/0 rc 0/0 [189054.710446] LustreError: 67981:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 28 previous similar messages [189054.710462] Lustre: fir-MDT0002: Bulk IO write error with 89ce1cbf-19a1-5ea3-ec1d-d2cf9a23d0e3 (at 10.8.30.34@o2ib6), client will retry: rc = -110 [189054.710463] Lustre: Skipped 34 previous similar messages [189054.775700] LustreError: 68055:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff9a5460315850 x1648768339487792/t0(0) o4->edd9f4e3-d4ae-b940-acd7-6830893a6b9f@10.8.18.29@o2ib6:81/0 lens 488/448 e 1 to 0 dl 1573021441 ref 1 fl Interpret:/0/0 rc 0/0 [189054.800529] LustreError: 68055:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) Skipped 7 previous similar messages [189059.775569] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.210@o2ib7 added to recovery queue. Health = 900 [189059.775581] LustreError: 67982:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(112137) req@ffff9a74378df050 x1648382855896032/t0(0) o4->fb81a199-8d13-72c5-dc77-3ecd734ddf75@10.9.102.33@o2ib4:86/0 lens 488/448 e 1 to 0 dl 1573021446 ref 1 fl Interpret:/0/0 rc 0/0 [189059.813500] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 1 previous similar message [189059.857567] Lustre: 67940:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573021422/real 1573021422] req@ffff9a71aa000900 x1649330145585120/t0(0) o104->fir-MDT0002@10.8.17.16@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573021429 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 [189059.885083] Lustre: 67940:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [189064.823701] LustreError: 68027:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff9a603c88b050 x1648297167946112/t0(0) o4->8f2648b4-4022-d79e-18a5-f850119b4e30@10.8.17.16@o2ib6:93/0 lens 488/448 e 0 to 0 dl 1573021453 ref 1 fl Interpret:/2/0 rc 0/0 [189064.848531] LustreError: 68027:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) Skipped 10 previous similar messages [189069.695158] Lustre: fir-MDT0002: Connection restored to bb0489d8-99d9-bd6e-c7e4-6c2155fd6f79 (at 10.8.23.36@o2ib6) [189069.705605] Lustre: Skipped 318 previous similar messages [189074.823948] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.202@o2ib7 added to recovery queue. Health = 900 [189074.823957] LustreError: 68006:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(8192) req@ffff9a707f5e4050 x1648734572290656/t0(0) o4->1c279fe8-3e68-df63-5600-ae03a58a8f27@10.8.25.23@o2ib6:104/0 lens 504/448 e 1 to 0 dl 1573021464 ref 1 fl Interpret:/2/0 rc 0/0 [189074.823974] Lustre: fir-MDT0002: Bulk IO write error with 1c279fe8-3e68-df63-5600-ae03a58a8f27 (at 10.8.25.23@o2ib6), client will retry: rc = -110 [189074.823975] Lustre: Skipped 19 previous similar messages [189074.880310] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 1 previous similar message [189080.207084] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [189080.217345] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 111 previous similar messages [189101.698877] Lustre: fir-MDT0002: Connection restored to 544105de-8833-e38a-4ec5-601f76f65e5f (at 10.9.103.43@o2ib4) [189101.709402] Lustre: Skipped 236 previous similar messages [189107.655797] Lustre: 67609:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573021470/real 1573021470] req@ffff9a76f5107080 x1649330145606672/t0(0) o104->fir-MDT0002@10.9.0.61@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573021477 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [189109.747316] Lustre: fir-MDT0002: Client d027e407-9dbb-a5d4-00d4-c10acb273f18 (at 10.9.110.24@o2ib4) reconnecting [189109.757572] Lustre: Skipped 725 previous similar messages [189137.171562] LustreError: 67256:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a5de33af850 x1648660430874432/t0(0) o4->51315fc7-c4b3-f078-d969-3ad7a610223a@10.8.8.32@o2ib6:170/0 lens 488/448 e 1 to 0 dl 1573021530 ref 1 fl Interpret:/2/0 rc 0/0 [189137.171668] Lustre: fir-MDT0002: Bulk IO write error with 51315fc7-c4b3-f078-d969-3ad7a610223a (at 10.8.8.32@o2ib6), client will retry: rc = -110 [189137.208736] LustreError: 67256:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 2 previous similar messages [189145.712782] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 107s: evicting client at 10.8.27.35@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a6f98cbd580/0x746adaf77f1b4e7c lrc: 3/0,0 mode: PW/PW res: [0x2c00335a8:0x4baa:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.27.35@o2ib6 remote: 0x7ad63685c15f7ac8 expref: 631336 pid: 67940 timeout: 189140 lvb_type: 0 [189145.750993] LustreError: 67738:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.27.35@o2ib6) failed to reply to blocking AST (req@ffff9a5bfaa96300 x1649330145619840 status 0 rc -5), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6dd7fe5c40/0x746adaf77f1cfcfc lrc: 4/0,0 mode: PR/PR res: [0x2c00335a8:0x4bad:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.27.35@o2ib6 remote: 0x7ad63685c15f7b77 expref: 631335 pid: 67510 timeout: 189235 lvb_type: 0 [189145.793952] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.35@o2ib6 was evicted due to a lock blocking callback time out: rc -5 [189145.806380] LustreError: Skipped 1 previous similar message [189145.812076] LustreError: 67738:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffff9a91b9449400 ns: mdt-fir-MDT0002_UUID lock: ffff9a5ebeea9440/0x746adaf77f1cfd1f lrc: 3/0,0 mode: PW/PW res: [0x2c00335a8:0x4bad:0x0].0x0 bits 0x40/0x0 rrc: 5 type: IBT flags: 0x50200000000000 nid: 10.8.27.35@o2ib6 remote: 0x7ad63685c15f7b8c expref: 623989 pid: 67738 timeout: 0 lvb_type: 0 [189145.924783] LustreError: 68023:0:(ldlm_lib.c:3250:target_bulk_io()) @@@ Eviction on bulk WRITE req@ffff9a5dcd75e050 x1649068860244160/t0(0) o4->67360d0f-602d-e0fd-a763-b6dc0eec238b@10.8.27.35@o2ib6:193/0 lens 488/448 e 0 to 0 dl 1573021553 ref 1 fl Interpret:/2/0 rc 0/0 [189151.750935] Lustre: 67652:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573021514/real 1573021514] req@ffff9a5bfab8da00 x1649330145621216/t0(0) o106->fir-MDT0002@10.9.117.43@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1573021521 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [189151.778357] Lustre: 67652:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [189156.474017] LNet: 67091:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.201@o2ib7 [189163.486250] Lustre: fir-MDT0002: Bulk IO read error with 27c338c1-cde2-284f-c685-3185934e4eac (at 10.8.19.4@o2ib6), client will retry: rc -110 [189164.473285] Lustre: fir-MDT0002: Bulk IO read error with 27c338c1-cde2-284f-c685-3185934e4eac (at 10.8.19.4@o2ib6), client will retry: rc -110 [189165.714244] Lustre: fir-MDT0002: Connection restored to a040630b-6b1d-e359-6ec4-01dbc14e42d3 (at 10.9.117.8@o2ib4) [189165.724675] Lustre: Skipped 534 previous similar messages [189214.714562] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.9.101.21@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a7b79964800/0x746adaf761a96938 lrc: 3/0,0 mode: PR/PR res: [0x2c0000404:0x2f6:0x0].0x0 bits 0x13/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.101.21@o2ib4 remote: 0xc37d69f29081f058 expref: 87811 pid: 67971 timeout: 189209 lvb_type: 0 [189214.752822] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message [189228.371928] Lustre: fir-MDT0002: haven't heard from client 552b91a8-70ba-1adf-a557-8183303c8401 (at 10.8.20.9@o2ib6) in 215 seconds. I think it's dead, and I am evicting it. exp ffff9a90fd628c00, cur 1573021598 expire 1573021448 last 1573021383 [189232.578138] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_statfs to node 10.0.10.51@o2ib7 failed: rc = -107 [189232.589008] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [189237.923220] Lustre: fir-MDT0002: Client 101c7f86-2210-a696-f6b9-c9f6ce50226a (at 10.9.117.46@o2ib4) reconnecting [189237.933480] Lustre: Skipped 1103 previous similar messages [189239.346186] LNet: Service thread pid 67903 was inactive for 200.50s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [189239.363213] Pid: 67903, comm: mdt01_081 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [189239.373475] Call Trace: [189239.376031] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [189239.383064] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [189239.390344] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [189239.397259] [] mdt_object_lock_internal+0x70/0x360 [mdt] [189239.404349] [] mdt_layout_change+0x2a4/0x430 [mdt] [189239.410921] [] mdt_intent_layout+0x7ee/0xcc0 [mdt] [189239.417500] [] mdt_intent_policy+0x435/0xd80 [mdt] [189239.424098] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [189239.430972] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [189239.438187] [] tgt_enqueue+0x62/0x210 [ptlrpc] [189239.444446] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [189239.451474] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [189239.459290] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [189239.465712] [] kthread+0xd1/0xe0 [189239.470713] [] ret_from_fork_nospec_begin+0xe/0x21 [189239.477275] [] 0xffffffffffffffff [189239.482380] LustreError: dumping log to /tmp/lustre-log.1573021609.67903 [189239.715203] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 108s: evicting client at 10.8.17.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a728cd54ec0/0x746adaf77f1b5bdb lrc: 3/0,0 mode: PW/PW res: [0x2c0032167:0x1c7b0:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.17.15@o2ib6 remote: 0x6649c066b4c1608 expref: 443877 pid: 67712 timeout: 189234 lvb_type: 0 [189245.715360] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 106s: evicting client at 10.8.0.82@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a7177ff2640/0x746adaf77f1ba6ce lrc: 3/0,0 mode: EX/EX res: [0x2c00321e8:0x573e:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x60000400000020 nid: 10.8.0.82@o2ib6 remote: 0xd9239c34ef8800fb expref: 5678 pid: 67652 timeout: 189240 lvb_type: 3 [189251.715514] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 107s: evicting client at 10.8.18.28@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a62569fb3c0/0x746adaf77f1cfc70 lrc: 3/0,0 mode: EX/EX res: [0x2c00335a5:0x1292c:0x0].0x0 bits 0x8/0x0 rrc: 4 type: IBT flags: 0x60000400000020 nid: 10.8.18.28@o2ib6 remote: 0x945faef430e9353f expref: 695317 pid: 67837 timeout: 189246 lvb_type: 3 [189252.658667] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [189281.664156] LustreError: 67990:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 147456 GRANT, real grant 53248 [189294.509676] Lustre: fir-MDT0002: Connection restored to (at 10.8.27.20@o2ib6) [189294.517002] Lustre: Skipped 803 previous similar messages [189296.687280] LustreError: 67989:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [189296.701421] LustreError: 67989:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 209 previous similar messages [189306.767591] LustreError: 67908:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [189306.781670] LustreError: 67908:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 257 previous similar messages [189321.823763] LustreError: 68006:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [189321.837846] LustreError: 68006:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 172 previous similar messages [189328.439718] LustreError: 67903:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffff9a91bed0f000 ns: mdt-fir-MDT0002_UUID lock: ffff9a7457c83cc0/0x746adaf77f1b58ee lrc: 3/0,0 mode: EX/EX res: [0x2c00335a0:0x174e9:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x50000000000000 nid: 10.8.17.16@o2ib6 remote: 0x612952ad3a313dda expref: 193919 pid: 67903 timeout: 0 lvb_type: 3 [189328.474798] Lustre: 67903:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (120:170s); client may timeout. req@ffff9a692af36300 x1648297167944704/t412561706572(0) o101->8f2648b4-4022-d79e-18a5-f850119b4e30@10.8.17.16@o2ib6:168/0 lens 376/1568 e 5 to 0 dl 1573021528 ref 1 fl Complete:/0/0 rc -107/-107 [189328.505135] LNet: Service thread pid 67903 completed after 289.65s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [189336.880393] LustreError: 67975:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 147456 GRANT, real grant 0 [189336.894467] LustreError: 67975:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 262 previous similar messages [189351.936805] LustreError: 68010:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 147456 GRANT, real grant 0 [189351.950879] LustreError: 68010:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 263 previous similar messages [189382.033542] LustreError: 67962:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [189382.047620] LustreError: 67962:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 520 previous similar messages [189427.170787] LustreError: 67989:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [189427.184864] LustreError: 67989:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 793 previous similar messages [189470.776261] LNet: Service thread pid 67864 was inactive for 226.40s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [189470.793280] Pid: 67864, comm: mdt02_050 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [189470.803546] Call Trace: [189470.806108] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [189470.812820] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [189470.819598] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [189470.826358] [] osp_md_object_lock+0x162/0x2d0 [osp] [189470.833032] [] lod_object_lock+0xf3/0x7b0 [lod] [189470.839342] [] mdd_object_lock+0x3e/0xe0 [mdd] [189470.845582] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [189470.852931] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [189470.859777] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [189470.866089] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [189470.872674] [] mdt_reint_rec+0x83/0x210 [mdt] [189470.878808] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [189470.885474] [] mdt_reint+0x67/0x140 [mdt] [189470.891264] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [189470.898318] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [189470.906128] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [189470.912559] [] kthread+0xd1/0xe0 [189470.917560] [] ret_from_fork_nospec_begin+0xe/0x21 [189470.924138] [] 0xffffffffffffffff [189470.929257] LustreError: dumping log to /tmp/lustre-log.1573021840.67864 [189475.380467] Lustre: fir-MDT0002: haven't heard from client 8f2648b4-4022-d79e-18a5-f850119b4e30 (at 10.8.17.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5bfaada000, cur 1573021845 expire 1573021695 last 1573021618 [189475.402257] Lustre: Skipped 14 previous similar messages [189502.324678] LustreError: 67959:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [189502.338755] LustreError: 67959:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1325 previous similar messages [189524.907068] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.105.60@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [189524.924525] LustreError: Skipped 5674 previous similar messages [189530.219822] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [189530.230082] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.209@o2ib7 (15): c: 7, oc: 0, rc: 8 [189546.219239] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [189546.231236] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 65 previous similar messages [189551.385669] Lustre: fir-MDT0002: haven't heard from client 84a5f560-bd35-cee9-1e08-aca7fd4bd4f1 (at 10.9.109.1@o2ib4) in 163 seconds. I think it's dead, and I am evicting it. exp ffff9a911c215800, cur 1573021921 expire 1573021771 last 1573021758 [189558.947558] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [189558.959639] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [189635.132536] LNet: Service thread pid 67660 was inactive for 200.37s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [189635.149558] Pid: 67660, comm: mdt03_017 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [189635.159816] Call Trace: [189635.162371] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [189635.169059] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [189635.175820] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [189635.182597] [] osp_md_object_lock+0x162/0x2d0 [osp] [189635.189246] [] lod_object_lock+0xf3/0x7b0 [lod] [189635.195578] [] mdd_object_lock+0x3e/0xe0 [mdd] [189635.201795] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [189635.209166] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [189635.216008] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [189635.222345] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [189635.228956] [] mdt_reint_rec+0x83/0x210 [mdt] [189635.235125] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [189635.241825] [] mdt_reint+0x67/0x140 [mdt] [189635.247648] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [189635.254715] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [189635.262555] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [189635.268983] [] kthread+0xd1/0xe0 [189635.274036] [] ret_from_fork_nospec_begin+0xe/0x21 [189635.280628] [] 0xffffffffffffffff [189635.285785] LustreError: dumping log to /tmp/lustre-log.1573022004.67660 [189637.656317] LustreError: 67973:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 147456 GRANT, real grant 0 [189637.670392] LustreError: 67973:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 2395 previous similar messages [189652.028975] LNet: Service thread pid 67879 was inactive for 200.37s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [189652.045995] Pid: 67879, comm: mdt03_058 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [189652.056250] Call Trace: [189652.058807] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [189652.065502] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [189652.072297] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [189652.079047] [] osp_md_object_lock+0x162/0x2d0 [osp] [189652.085712] [] lod_object_lock+0xf3/0x7b0 [lod] [189652.092029] [] mdd_object_lock+0x3e/0xe0 [mdd] [189652.098261] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [189652.105611] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [189652.112455] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [189652.118768] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [189652.125330] [] mdt_reint_rec+0x83/0x210 [mdt] [189652.131480] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [189652.138130] [] mdt_reint+0x67/0x140 [mdt] [189652.143926] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [189652.150957] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [189652.158788] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [189652.165207] [] kthread+0xd1/0xe0 [189652.170212] [] ret_from_fork_nospec_begin+0xe/0x21 [189652.176777] [] 0xffffffffffffffff [189652.181899] LustreError: dumping log to /tmp/lustre-log.1573022021.67879 [189652.189430] Pid: 67773, comm: mdt03_038 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [189652.199702] Call Trace: [189652.202248] [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [189652.208922] [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] [189652.215698] [] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [189652.222467] [] osp_md_object_lock+0x162/0x2d0 [osp] [189652.229132] [] lod_object_lock+0xf3/0x7b0 [lod] [189652.235436] [] mdd_object_lock+0x3e/0xe0 [mdd] [189652.241663] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [189652.249016] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [189652.255852] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [189652.262154] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [189652.268738] [] mdt_reint_rec+0x83/0x210 [mdt] [189652.274868] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [189652.281531] [] mdt_reint+0x67/0x140 [mdt] [189652.287338] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [189652.294383] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [189652.302187] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [189652.308614] [] kthread+0xd1/0xe0 [189652.313610] [] ret_from_fork_nospec_begin+0xe/0x21 [189652.320178] [] 0xffffffffffffffff [189658.225151] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.51@o2ib7) [189658.232463] Lustre: Skipped 4 previous similar messages [189673.990216] Lustre: fir-MDT0002: Client 80a8792f-a989-a169-9080-96907468b701 (at 10.9.113.10@o2ib4) reconnecting [189674.000484] Lustre: Skipped 135 previous similar messages [189690.222972] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [189690.233233] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 102 previous similar messages [189754.432745] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [189754.552975] LustreError: 11-0: fir-MDT0000-lwp-MDT0002: operation quota_acquire to node 10.0.10.51@o2ib7 failed: rc = -11 [189754.564018] LustreError: Skipped 1 previous similar message [189809.325704] LNet: Service thread pid 67864 completed after 564.94s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [189809.341990] LNet: Skipped 2 previous similar messages [189899.758970] LustreError: 67975:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 147456 GRANT, real grant 0 [189899.773114] LustreError: 67975:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1593 previous similar messages [190148.234950] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [190148.246945] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 120 previous similar messages [190290.238631] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [190290.248884] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 111 previous similar messages [190340.254939] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [190340.267029] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [190421.452444] LustreError: 67260:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 147456 GRANT, real grant 0 [190421.466519] LustreError: 67260:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 7480 previous similar messages [190751.250519] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [190751.262543] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 134 previous similar messages [190826.723918] Lustre: fir-MDT0002: Connection restored to c7da60e2-1da2-1aa0-71e3-83d7cefee0c5 (at 10.9.112.3@o2ib4) [190826.734361] Lustre: Skipped 4 previous similar messages [190891.254074] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 1 seconds [190891.264333] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 119 previous similar messages [191028.267628] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 126976 GRANT, real grant 0 [191028.281702] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 8342 previous similar messages [191065.652468] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [191065.664559] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [191352.265733] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [191352.277733] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 135 previous similar messages [191495.269365] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [191495.279651] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 115 previous similar messages [191629.536073] LustreError: 67969:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 139264 GRANT, real grant 0 [191629.550156] LustreError: 67969:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 7625 previous similar messages [191845.961405] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [191845.973492] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [191954.281240] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [191954.293233] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 133 previous similar messages [192095.284895] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [192095.295156] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 115 previous similar messages [192230.032251] LustreError: 67989:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 143360 GRANT, real grant 0 [192230.046338] LustreError: 67989:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 7049 previous similar messages [192555.305860] LNetError: 93340:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [192555.317855] LNetError: 93340:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 128 previous similar messages [192570.307306] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [192570.319398] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [192571.037910] Lustre: fir-MDT0002: Connection restored to (at 10.9.107.66@o2ib4) [192601.382030] Lustre: fir-MDT0002: Connection restored to b5279a15-f23e-ea46-d68a-9fe8704ff580 (at 10.9.106.19@o2ib4) [192696.300551] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [192696.310814] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 117 previous similar messages [192711.936758] Lustre: fir-MDT0002: Connection restored to 668af727-5721-afb2-3f2d-f797a01fe0c3 (at 10.9.106.14@o2ib4) [192830.351181] LustreError: 67967:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 135168 GRANT, real grant 0 [192830.365264] LustreError: 67967:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 7143 previous similar messages [193159.312650] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [193159.324655] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages [193300.316281] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [193300.326540] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 110 previous similar messages [193345.551459] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [193345.563558] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [193430.927612] LustreError: 68002:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [193430.941697] LustreError: 68002:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6615 previous similar messages [193759.715992] LNetError: 93340:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [193759.727991] LNetError: 93340:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 125 previous similar messages [193905.331784] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [193905.342041] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 111 previous similar messages [193946.757855] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [193946.769941] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [194030.970020] LustreError: 68008:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [194030.984110] LustreError: 68008:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5394 previous similar messages [194363.344597] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [194363.356598] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 123 previous similar messages [194505.347337] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [194505.357598] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 106 previous similar messages [194554.871623] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [194554.883707] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [194631.086290] LustreError: 67259:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 147456 GRANT, real grant 0 [194631.100370] LustreError: 67259:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5351 previous similar messages [194964.359316] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [194964.371311] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages [195110.363085] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [195110.373343] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 116 previous similar messages [195156.018225] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [195156.030309] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [195231.120770] LustreError: 67260:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 77824 GRANT, real grant 0 [195231.134765] LustreError: 67260:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5729 previous similar messages [195468.278701] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.22@o2ib4) [195496.533969] Lustre: fir-MDT0002: haven't heard from client 7a8a4b1d-d199-c9c0-dfb9-109689624b92 (at 10.9.106.29@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a919bbc5c00, cur 1573027866 expire 1573027716 last 1573027639 [195565.159468] LNetError: 94220:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [195565.171493] LNetError: 94220:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 126 previous similar messages [195661.538232] Lustre: fir-MDT0002: haven't heard from client dbe77bd6-d521-20d5-8cb3-fd42f94ef38e (at 10.9.112.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8c98b6ec00, cur 1573028031 expire 1573027881 last 1573027804 [195710.378470] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [195710.388726] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 114 previous similar messages [195831.502355] LustreError: 68055:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 131072 GRANT, real grant 0 [195831.516435] LustreError: 68055:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6835 previous similar messages [195880.234782] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [195880.246869] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [196168.390162] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [196168.402158] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 125 previous similar messages [196311.393795] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [196311.404056] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 110 previous similar messages [196431.572697] LustreError: 67962:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 143360 GRANT, real grant 0 [196431.586812] LustreError: 67962:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6964 previous similar messages [196661.453815] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [196661.465902] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [196770.490200] LNetError: 94220:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [196770.502201] LNetError: 94220:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 120 previous similar messages [196859.374815] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.29@o2ib4) [196911.409177] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [196911.419435] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 110 previous similar messages [197031.644102] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 135168 GRANT, real grant 0 [197031.658225] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6886 previous similar messages [197374.421013] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [197374.433019] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 119 previous similar messages [197380.593159] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [197380.605251] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [197395.855920] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.4@o2ib4) [197515.424624] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [197515.434885] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 101 previous similar messages [197631.873974] LustreError: 68015:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 143360 GRANT, real grant 0 [197631.888066] LustreError: 68015:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6947 previous similar messages [197975.436664] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [197975.448667] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 112 previous similar messages [197981.660825] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [197981.672911] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [198120.440431] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [198120.450703] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 102 previous similar messages [198231.954048] LustreError: 68068:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [198231.968130] LustreError: 68068:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5829 previous similar messages [198577.452302] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [198577.464297] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages [198589.879613] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [198589.891697] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [198725.457102] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [198725.467356] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 104 previous similar messages [198832.042296] LustreError: 68019:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [198832.056384] LustreError: 68019:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5750 previous similar messages [199178.467496] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [199178.479509] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 129 previous similar messages [199191.111812] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [199191.123896] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [199341.471695] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [199341.481953] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 109 previous similar messages [199433.295947] LustreError: 68068:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 135168 GRANT, real grant 0 [199433.310029] LustreError: 68068:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6054 previous similar messages [199780.483071] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [199780.495068] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 117 previous similar messages [199945.487342] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [199945.497601] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 102 previous similar messages [199966.194886] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [199966.206970] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [200033.834402] LustreError: 67260:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 122880 GRANT, real grant 0 [200033.848508] LustreError: 67260:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6866 previous similar messages [200380.498646] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [200380.510643] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 111 previous similar messages [200550.503095] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [200550.513354] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 104 previous similar messages [200575.267745] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [200575.279851] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [200633.900555] LustreError: 68068:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [200633.914710] LustreError: 68068:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 7036 previous similar messages [200855.672316] Lustre: fir-MDT0002: haven't heard from client 0caf5760-1b86-b378-15f3-8416e7d8bfad (at 10.9.112.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90a3672800, cur 1573033225 expire 1573033075 last 1573032998 [200981.514412] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [200981.526410] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 114 previous similar messages [201150.518853] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [201150.529114] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 99 previous similar messages [201233.929096] LustreError: 67992:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 131072 GRANT, real grant 0 [201233.943179] LustreError: 67992:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 7422 previous similar messages [201292.346634] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [201292.358718] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [201585.386767] LNetError: 95220:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [201585.398762] LNetError: 95220:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 120 previous similar messages [201755.534886] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [201755.545146] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 100 previous similar messages [201834.620362] LustreError: 67994:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [201834.634360] LustreError: 67994:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 7195 previous similar messages [201901.410736] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [201901.422820] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [201908.650422] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.33@o2ib4) [202082.543504] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [202082.553761] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.51@o2ib7 (1): c: 4, oc: 0, rc: 7 [202082.566064] LNetError: 67085:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.51@o2ib7 added to recovery queue. Health = 900 [202086.902624] Lustre: 67133:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573034448/real 1573034448] req@ffff9a78bcf6e300 x1649330258380528/t0(0) o41->fir-MDT0000-osp-MDT0002@10.0.10.51@o2ib7:24/4 lens 224/368 e 0 to 1 dl 1573034455 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [202086.930863] Lustre: 67133:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [202086.940618] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [202088.001960] Lustre: fir-MDT0002: Client 40df94b7-4e65-3458-a595-b9607572f9d8 (at 10.9.102.22@o2ib4) reconnecting [202088.012244] Lustre: fir-MDT0002: Connection restored to 40df94b7-4e65-3458-a595-b9607572f9d8 (at 10.9.102.22@o2ib4) [202091.388604] Lustre: fir-MDT0002: Connection restored to 76edd001-e52d-04d1-7562-2383f7d6c64f (at 10.9.101.68@o2ib4) [202119.295474] Lustre: 67142:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1573034481/real 0] req@ffff9a74757ead00 x1649330258473440/t0(0) o400->MGC10.0.10.51@o2ib7@10.0.10.51@o2ib7:26/25 lens 224/224 e 0 to 1 dl 1573034488 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [202119.322829] Lustre: 67142:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [202119.332578] LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail [202125.222752] Lustre: fir-MDT0002: Received new LWP connection from 10.0.10.51@o2ib7, removing former export from same NID [202125.233733] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.51@o2ib7) [202134.847919] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.104.7@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [202134.865303] LustreError: Skipped 2618 previous similar messages [202169.473196] Lustre: MGC10.0.10.51@o2ib7: Connection restored to MGC10.0.10.51@o2ib7_0 (at 10.0.10.51@o2ib7) [202188.546298] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [202188.558289] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 122 previous similar messages [202194.561653] Lustre: fir-MDT0000-osp-MDT0002: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) [202365.550912] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [202365.561174] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 113 previous similar messages [202434.702671] LustreError: 67962:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 143360 GRANT, real grant 0 [202434.716752] LustreError: 67962:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 7753 previous similar messages [202565.243737] Lustre: fir-MDT0002: Connection restored to 0caf5760-1b86-b378-15f3-8416e7d8bfad (at 10.9.112.5@o2ib4) [202681.500145] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [202681.512232] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [202790.543303] LNetError: 95220:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [202790.555303] LNetError: 95220:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 116 previous similar messages [202965.566506] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [202965.576765] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 105 previous similar messages [203034.768411] LustreError: 68026:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [203034.782496] LustreError: 68026:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6465 previous similar messages [203391.577629] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [203391.589622] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 112 previous similar messages [203406.601015] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [203406.613122] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [203572.582399] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [203572.592654] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [203634.922894] LustreError: 68012:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 135168 GRANT, real grant 0 [203634.936985] LustreError: 68012:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6475 previous similar messages [203992.593461] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [203992.605454] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 114 previous similar messages [204175.598309] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [204175.608568] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [204181.699471] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [204181.711557] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [204234.972450] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [204234.986535] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6829 previous similar messages [204595.758007] LNetError: 96128:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [204595.770025] LNetError: 96128:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 112 previous similar messages [204776.614164] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [204776.624422] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 98 previous similar messages [204788.778489] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [204788.790585] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [204835.049801] LustreError: 67259:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 131072 GRANT, real grant 0 [204835.063906] LustreError: 67259:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5926 previous similar messages [205199.625305] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [205199.637298] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 124 previous similar messages [205378.630006] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [205378.640267] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 106 previous similar messages [205390.861332] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [205390.873421] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [205435.086059] LustreError: 68021:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 110592 GRANT, real grant 0 [205435.100136] LustreError: 68021:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5211 previous similar messages [205800.641114] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [205800.653114] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 123 previous similar messages [205980.645837] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [205980.656093] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 113 previous similar messages [206041.141098] LustreError: 67981:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 155648 GRANT, real grant 0 [206041.155174] LustreError: 67981:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5675 previous similar messages [206115.977400] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [206115.989485] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [206403.657021] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [206403.669013] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 124 previous similar messages [206582.661756] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [206582.672015] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 106 previous similar messages [206647.665554] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 143360 GRANT, real grant 0 [206647.679631] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5834 previous similar messages [206896.082109] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [206896.094191] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [207005.111315] LNetError: 96652:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [207005.123333] LNetError: 96652:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 121 previous similar messages [207184.677686] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [207184.687944] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 101 previous similar messages [207258.997230] LustreError: 68031:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli f2898033-6a23-2537-8cf9-46709394f401 claims 122880 GRANT, real grant 0 [207259.011310] LustreError: 68031:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 6368 previous similar messages [207605.170943] LNetError: 96652:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [207605.182940] LNetError: 96652:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages [207620.171052] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [207620.183139] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [207785.693489] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [207785.703830] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 101 previous similar messages [207868.731152] LustreError: 67259:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [207868.745138] LustreError: 67259:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 943 previous similar messages [208205.244714] LNetError: 96652:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [208205.256712] LNetError: 96652:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages [208289.616235] Lustre: fir-MDT0002: Connection restored to 7cfd3b13-0dc3-267a-ac99-3d290eaa7eda (at 10.9.107.71@o2ib4) [208388.709083] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [208388.719337] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [208401.267410] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [208401.279491] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [208489.211427] LustreError: 68023:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [208489.225419] LustreError: 68023:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 11 previous similar messages [208795.735261] Lustre: fir-MDT0002: Connection restored to 918e914c-7fde-28f7-b867-3f6762039fa7 (at 10.9.106.25@o2ib4) [208805.719917] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [208805.731911] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages [208834.881743] Lustre: fir-MDT0002: haven't heard from client e70b96f4-068f-54cf-5abc-6ce13d981179 (at 10.9.108.47@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8847118400, cur 1573041204 expire 1573041054 last 1573040977 [208989.724722] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [208989.734977] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 100 previous similar messages [209083.930910] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.35@o2ib4) [209103.803421] LustreError: 68061:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [209103.817416] LustreError: 68061:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 14 previous similar messages [209125.356261] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [209125.368343] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [209405.735587] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [209405.747586] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 116 previous similar messages [209590.740435] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 0 seconds [209590.750697] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 102 previous similar messages [209716.235161] LustreError: 67932:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [209716.249160] LustreError: 67932:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 11 previous similar messages [209905.451643] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [209905.463725] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [210010.751390] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [210010.763393] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 119 previous similar messages [210193.756158] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [210193.766417] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [210293.764309] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.47@o2ib4) [210369.596165] LustreError: 67972:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [210369.610159] LustreError: 67972:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [210414.054006] Lustre: fir-MDT0002: Connection restored to a6b2fe14-6d3a-b636-1a62-d5dba2f1d9eb (at 10.9.106.18@o2ib4) [210615.539506] LNetError: 97518:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [210615.551530] LNetError: 97518:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 118 previous similar messages [210630.539574] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [210630.551662] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [210795.771863] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [210795.782125] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 100 previous similar messages [210951.420830] Lustre: fir-MDT0002: Connection restored to bc665767-df8b-9748-9d41-68bff1ff621a (at 10.9.113.13@o2ib4) [211007.917044] LustreError: 67993:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [211007.931040] LustreError: 67993:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [211218.783908] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [211218.795902] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 121 previous similar messages [211395.787514] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [211395.797773] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [211411.642929] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [211411.655026] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [211635.677275] LustreError: 68070:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [211635.691269] LustreError: 68070:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [211820.798608] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [211820.810605] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 114 previous similar messages [211995.803153] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [211995.813415] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 101 previous similar messages [212135.724798] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [212135.736883] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [212289.422017] LustreError: 67972:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 32768 GRANT, real grant 0 [212289.436010] LustreError: 67972:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [212423.814254] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [212423.826255] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 112 previous similar messages [212601.818787] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [212601.829044] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [212737.804315] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [212737.816397] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [212894.909879] LustreError: 68070:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [212894.923876] LustreError: 68070:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 11 previous similar messages [213025.829748] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [213025.841746] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 120 previous similar messages [213204.834351] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [213204.844612] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 106 previous similar messages [213511.900296] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [213511.912374] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [213563.119124] LustreError: 68012:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 32768 GRANT, real grant 0 [213563.133147] LustreError: 68012:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [213625.939770] LNetError: 98415:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [213625.951768] LNetError: 98415:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 122 previous similar messages [213806.849924] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [213806.860178] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 112 previous similar messages [214120.003954] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [214120.016039] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [214205.007237] LustreError: 67993:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [214205.021229] LustreError: 67993:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [214226.034254] LNetError: 98680:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [214226.046274] LNetError: 98680:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 118 previous similar messages [214406.865353] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [214406.875609] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 100 previous similar messages [214723.078325] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [214723.090425] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [214819.119153] LustreError: 67998:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [214819.133144] LustreError: 67998:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 13 previous similar messages [214831.107401] LNetError: 98816:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [214831.119419] LNetError: 98816:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 117 previous similar messages [215006.880485] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [215006.890748] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [215428.734795] LustreError: 67978:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [215428.748788] LustreError: 67978:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 14 previous similar messages [215431.891341] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [215431.903333] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 119 previous similar messages [215447.179721] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [215447.191805] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [215612.895953] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [215612.906216] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 104 previous similar messages [216035.906771] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [216035.918765] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 118 previous similar messages [216075.695434] LustreError: 67259:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [216075.709423] LustreError: 67259:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 14 previous similar messages [216214.911377] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [216214.921633] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 101 previous similar messages [216227.278700] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [216227.290791] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [216557.101747] Lustre: fir-MDT0002: haven't heard from client 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bedf2c00, cur 1573048926 expire 1573048776 last 1573048699 [216636.923486] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [216636.935478] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 111 previous similar messages [216687.439280] LustreError: 68008:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [216687.453273] LustreError: 68008:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 13 previous similar messages [216816.927190] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [216816.937451] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 97 previous similar messages [216952.349715] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [216952.361800] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [216981.286188] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [217237.938168] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [217237.950162] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages [217330.865013] LustreError: 68000:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [217330.879005] LustreError: 68000:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 15 previous similar messages [217416.942828] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [217416.953085] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 101 previous similar messages [217732.450037] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [217732.462127] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [217840.953859] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [217840.965853] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 122 previous similar messages [217938.831749] LustreError: 67990:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [217938.845734] LustreError: 67990:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [218019.958495] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [218019.968755] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 104 previous similar messages [218388.128201] Lustre: fir-MDT0002: haven't heard from client cfb176ec-35a3-9d2c-ef29-b7659a9f491e (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6e8a0a1400, cur 1573050757 expire 1573050607 last 1573050530 [218441.969457] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [218441.981451] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 117 previous similar messages [218456.548835] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [218456.560923] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [218544.447506] LustreError: 68000:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [218544.461503] LustreError: 68000:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 11 previous similar messages [218621.974115] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [218621.984371] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [218777.897215] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [219018.599314] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [219044.985082] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [219044.997082] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 122 previous similar messages [219055.150199] Lustre: fir-MDT0002: haven't heard from client a5c35592-80b2-37c7-f125-2658fd4f309a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7b20a0c800, cur 1573051424 expire 1573051274 last 1573051197 [219057.635380] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [219057.647467] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [219131.157557] Lustre: fir-MDT0002: haven't heard from client 9bfe4653-6d8c-a917-f276-9bc7dc8bcdb6 (at 10.9.106.21@o2ib4) in 207 seconds. I think it's dead, and I am evicting it. exp ffff9a91b944ec00, cur 1573051500 expire 1573051350 last 1573051293 [219150.286987] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [219150.300982] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 14 previous similar messages [219224.989712] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 1 seconds [219224.999971] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 105 previous similar messages [219646.000366] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [219646.012363] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 120 previous similar messages [219755.678581] LustreError: 68002:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [219755.692587] LustreError: 68002:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 16 previous similar messages [219826.004899] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [219826.015157] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [219832.729081] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [219832.741167] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [219924.181419] Lustre: fir-MDT0002: haven't heard from client f71a05cb-0281-cfe8-7d7a-9b10e944e06b (at 10.8.9.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61be4c3000, cur 1573052293 expire 1573052143 last 1573052066 [220249.015635] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [220249.027632] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 123 previous similar messages [220369.229974] LustreError: 68000:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [220369.243970] LustreError: 68000:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 16 previous similar messages [220427.020159] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [220427.030412] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 99 previous similar messages [220440.803556] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [220440.815636] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [220587.585232] perf: interrupt took too long (4067 > 4060), lowering kernel.perf_event_max_sample_rate to 49000 [220636.216057] Lustre: fir-MDT0002: Connection restored to 9bfe4653-6d8c-a917-f276-9bc7dc8bcdb6 (at 10.9.106.21@o2ib4) [220822.852856] Lustre: fir-MDT0002: Connection restored to 52fcaa53-c996-aeea-54dc-fce7f1ad52db (at 10.9.106.31@o2ib4) [220851.031018] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [220851.043011] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages [220991.998040] LustreError: 92868:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [220992.012034] LustreError: 92868:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 15 previous similar messages [221027.035586] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [221027.045847] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 100 previous similar messages [221043.871019] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [221043.883103] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [221153.216963] Lustre: fir-MDT0002: haven't heard from client 937a40e6-eeab-284d-3f33-c5218f968624 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a58a623c800, cur 1573053522 expire 1573053372 last 1573053295 [221194.577011] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [221453.046607] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [221453.058606] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 116 previous similar messages [221613.054135] LustreError: 68060:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [221613.068124] LustreError: 68060:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 13 previous similar messages [221632.051231] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [221632.061488] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 104 previous similar messages [221766.962701] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [221766.974793] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [222055.062125] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [222055.074119] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 105 previous similar messages [222234.066723] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [222234.076978] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 101 previous similar messages [222278.255286] LustreError: 68032:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [222278.269287] LustreError: 68032:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [222549.055827] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [222549.067910] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [222657.077614] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [222657.089613] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 120 previous similar messages [222837.082233] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [222837.092493] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 103 previous similar messages [222887.422900] LustreError: 68070:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [222887.436894] LustreError: 68070:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [222899.244174] Lustre: fir-MDT0002: haven't heard from client eb5f5496-fd7a-0473-a1d5-6c207282a568 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a75df49bc00, cur 1573055268 expire 1573055118 last 1573055041 [222910.736137] Lustre: fir-MDT0002: Connection restored to 11ac701a-2377-2e6d-d133-3035ac447a2d (at 10.8.30.17@o2ib6) [222975.245793] Lustre: fir-MDT0002: haven't heard from client 668eb028-82c2-c6e3-1d8f-48e15c9354b0 (at 10.9.104.26@o2ib4) in 180 seconds. I think it's dead, and I am evicting it. exp ffff9a61a9a28800, cur 1573055344 expire 1573055194 last 1573055164 [222994.855440] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [223257.159566] LNetError: 100892:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [223257.171650] LNetError: 100892:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 116 previous similar messages [223271.633735] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [223272.160410] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 [223272.172495] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [223297.254150] Lustre: fir-MDT0002: haven't heard from client f6791447-29b4-b589-1d38-f8760d88441c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6bd265e800, cur 1573055666 expire 1573055516 last 1573055439 [223439.097687] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [223439.107947] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 105 previous similar messages [223496.222459] LustreError: 68036:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [223496.236449] LustreError: 68036:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [223861.347778] LNetError: 100669:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [223861.359866] LNetError: 100669:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages [224041.113022] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [224041.123278] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 117 previous similar messages [224047.390197] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [224047.402279] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages [224111.374164] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [224111.388207] LustreError: 68011:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 12 previous similar messages [224462.123664] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [224462.135656] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages [224643.128149] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [224643.138412] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 115 previous similar messages [224654.289457] Lustre: fir-MDT0002: haven't heard from client 97969a64-9725-a74e-a48f-d40c5952491a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5d64189000, cur 1573057023 expire 1573056873 last 1573056796 [224656.621452] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [224656.633549] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 2 previous similar messages [224756.062445] LustreError: 67881:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [224756.076459] LustreError: 67881:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 13 previous similar messages [225065.138690] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [225065.150684] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages [225103.205777] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [225244.143134] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds [225244.153395] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 115 previous similar messages [225257.820476] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.203@o2ib7: -125 [225257.832562] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 3 previous similar messages [225372.146325] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [225372.156587] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (15): c: 4, oc: 0, rc: 8 [225394.606399] LustreError: 67260:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [225394.620387] LustreError: 67260:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 14 previous similar messages [225581.312561] Lustre: fir-MDT0002: haven't heard from client e09495d0-032a-d4b4-c08c-91d6784a515e (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6eda48f000, cur 1573057950 expire 1573057800 last 1573057723 [225619.779208] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [225667.153748] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [225667.165747] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages [225844.158170] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [225844.168429] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 140 previous similar messages [226007.021774] LustreError: 92902:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [226007.035768] LustreError: 92902:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 17 previous similar messages [226134.159434] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.201@o2ib7: -125 [226134.171540] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 6 previous similar messages [226271.169938] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [226271.181934] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 100 previous similar messages [226446.173311] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.209@o2ib7: 0 seconds [226446.183573] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 105 previous similar messages [226549.336976] Lustre: fir-MDT0002: haven't heard from client 0c5ef618-c444-13e1-cbbe-65d31989c3ab (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6dd4a88c00, cur 1573058918 expire 1573058768 last 1573058691 [226609.772784] LustreError: 68008:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [226609.786775] LustreError: 68008:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 13 previous similar messages [226756.378919] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [226777.182729] LNetError: 322:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [226782.301438] LNetError: 322:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [226792.304215] LNetError: 101547:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [226802.308030] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [226812.313630] LNetError: 318:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [226822.321303] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [226842.324249] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [226842.335750] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 1 previous similar message [226872.184090] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [226872.196110] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 100 previous similar messages [226882.328786] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [226882.340292] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 2 previous similar messages [226957.342385] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [226957.353861] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 5 previous similar messages [227015.131375] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [227059.349731] Lustre: fir-MDT0002: haven't heard from client 339c4a60-862f-1975-d3cc-80363d1a4b2f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d93bf5000, cur 1573059428 expire 1573059278 last 1573059201 [227086.371452] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [227086.382931] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 14 previous similar messages [227209.915794] LustreError: 68123:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0002: cli 09d35619-6b74-febd-1dd5-6d4a61665424 claims 28672 GRANT, real grant 0 [227209.929782] LustreError: 68123:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 15 previous similar messages [227262.193666] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [227262.203932] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.202@o2ib7 (6): c: 5, oc: 0, rc: 8 [227262.216272] LNetError: 67087:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.202@o2ib7 added to recovery queue. Health = 900 [227262.861761] Lustre: fir-MDT0002: Client bc041a60-e809-be20-8115-06e966b35673 (at 10.9.110.14@o2ib4) reconnecting [227262.872021] Lustre: Skipped 1 previous similar message [227262.877277] Lustre: fir-MDT0002: Connection restored to (at 10.9.110.14@o2ib4) [227265.161888] Lustre: fir-MDT0002: Connection restored to 60e7dd38-7049-6086-949c-b7f68f3f00ca (at 10.8.23.18@o2ib6) [227265.172333] Lustre: Skipped 36 previous similar messages [227265.340756] Lustre: 67667:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573059626/real 1573059626] req@ffff9a56aaeeb180 x1649330456354912/t0(0) o104->fir-MDT0002@10.9.108.36@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573059633 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [227269.167553] Lustre: fir-MDT0002: Connection restored to c3640923-95d6-e034-c6ab-f28c1a95bebd (at 10.9.104.38@o2ib4) [227269.178109] Lustre: Skipped 74 previous similar messages [227270.875752] Lustre: fir-MDT0002: Client e6122ed7-ba94-2cf0-06a2-d452873eaed0 (at 10.9.105.3@o2ib4) reconnecting [227270.885928] Lustre: Skipped 145 previous similar messages [227277.219336] Lustre: fir-MDT0002: Connection restored to b44d8559-b6fa-c6ac-9733-9495841decff (at 10.8.17.25@o2ib6) [227277.229771] Lustre: Skipped 104 previous similar messages [227282.317176] Lustre: 67789:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573059643/real 1573059643] req@ffff9a5ba2219200 x1649330456378160/t0(0) o104->fir-MDT0002@10.9.110.57@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573059650 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [227283.466204] LustreError: 68019:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a6068e2d050 x1649307330518016/t0(0) o4->101c7f86-2210-a696-f6b9-c9f6ce50226a@10.9.117.46@o2ib4:573/0 lens 504/448 e 0 to 0 dl 1573059683 ref 1 fl Interpret:/0/0 rc 0/0 [227283.490429] LustreError: 68019:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 53 previous similar messages [227283.500016] Lustre: fir-MDT0002: Bulk IO write error with 101c7f86-2210-a696-f6b9-c9f6ce50226a (at 10.9.117.46@o2ib4), client will retry: rc = -110 [227283.513308] Lustre: Skipped 58 previous similar messages [227286.879513] Lustre: fir-MDT0002: Client 2b1c79f0-7e80-9fb6-9652-d84b00c6c331 (at 10.8.30.28@o2ib6) reconnecting [227286.889688] Lustre: Skipped 124 previous similar messages [227345.255182] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.107.63@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [227345.272647] LustreError: Skipped 869 previous similar messages [227351.409040] LNetError: 318:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [227351.420260] LNetError: 318:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 26 previous similar messages [227407.197269] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.202@o2ib7: 0 seconds [227407.207528] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 75 previous similar messages [227478.200025] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [227478.212067] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 80 previous similar messages [227536.434337] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.210@o2ib7 added to recovery queue. Health = 900 [227537.447473] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.202@o2ib7: -125 [227537.459560] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 2 previous similar messages [227837.507020] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.202@o2ib7: -125 [227837.519115] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 1 previous similar message [227871.534597] LNetError: 102070:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [227871.546083] LNetError: 102070:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 45 previous similar messages [228012.212454] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.202@o2ib7: 0 seconds [228012.222713] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 95 previous similar messages [228082.214249] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [228082.226242] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages [228138.555682] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.202@o2ib7: -125 [228282.380429] Lustre: fir-MDT0002: haven't heard from client 92a3d6f6-1c5f-ab24-af9c-46cc39024523 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a72d8f59c00, cur 1573060651 expire 1573060501 last 1573060424 [228313.570486] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [228313.580925] Lustre: Skipped 63 previous similar messages [228445.595531] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.202@o2ib7: -125 [228445.607614] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 1 previous similar message [228476.621641] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [228476.633127] LNetError: 101794:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 56 previous similar messages [228613.227905] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.210@o2ib7: 1 seconds [228613.238162] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 106 previous similar messages [228687.229827] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [228687.241824] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 125 previous similar messages [229076.663848] LNetError: 39235:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [229076.675251] LNetError: 39235:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 52 previous similar messages [229118.406022] Lustre: fir-MDT0002: haven't heard from client ad881221-b6bd-6ae9-0501-2c04b28fa7f4 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7861466000, cur 1573061487 expire 1573061337 last 1573061260 [229296.679446] LNetError: 39235:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [229296.691454] LNetError: 39235:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 67 previous similar messages [229639.557706] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [229681.724601] LNetError: 102433:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [229681.736083] LNetError: 102433:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [229706.255798] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [229706.266053] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 1 previous similar message [229706.276220] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.211@o2ib7 (6): c: 3, oc: 0, rc: 8 [229706.288291] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 1 previous similar message [229706.298228] LNetError: 67095:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.211@o2ib7 added to recovery queue. Health = 900 [229706.625932] Lustre: fir-MDT0002: Client 153269ce-bf5e-2d19-09ea-053cf324c52b (at 10.9.104.18@o2ib4) reconnecting [229706.636200] Lustre: Skipped 9 previous similar messages [229706.641539] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.18@o2ib4) [229707.761222] Lustre: fir-MDT0002: Connection restored to 8b327a74-be7d-fd75-0b06-01a3a60b4f4d (at 10.9.0.2@o2ib4) [229707.771487] Lustre: Skipped 11 previous similar messages [229709.781700] Lustre: fir-MDT0002: Connection restored to b09e0a4e-a579-93c4-18c8-8c82fede88b9 (at 10.9.110.31@o2ib4) [229709.792224] Lustre: Skipped 30 previous similar messages [229710.652862] Lustre: fir-MDT0002: Client 7128bbc5-55d8-ff02-9d63-ba25c68604fa (at 10.9.101.7@o2ib4) reconnecting [229710.663037] Lustre: Skipped 54 previous similar messages [229713.807290] Lustre: fir-MDT0002: Connection restored to 4f141061-ae18-894c-4f72-bd8bac8d1dd7 (at 10.9.109.8@o2ib4) [229713.817735] Lustre: Skipped 52 previous similar messages [229718.801366] Lustre: fir-MDT0002: Client 44283a4f-50c0-87b5-8240-98a4faecd82d (at 10.9.105.65@o2ib4) reconnecting [229718.811626] Lustre: Skipped 85 previous similar messages [229720.624160] Lustre: 67858:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573062082/real 1573062082] req@ffff9a91279aec00 x1649330468347152/t0(0) o104->fir-MDT0002@10.9.117.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573062089 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [229721.902469] Lustre: fir-MDT0002: Connection restored to (at 10.9.110.57@o2ib4) [229721.909867] Lustre: Skipped 69 previous similar messages [229790.843331] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.101.52@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [229814.488852] Lustre: fir-MDT0002: Client fd8f31dd-f57e-08f9-cad5-8fd7064f652e (at 10.9.110.29@o2ib4) reconnecting [229814.499120] Lustre: Skipped 63 previous similar messages [229814.504575] Lustre: fir-MDT0002: Connection restored to fd8f31dd-f57e-08f9-cad5-8fd7064f652e (at 10.9.110.29@o2ib4) [229814.515103] Lustre: Skipped 38 previous similar messages [229863.259786] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 1 seconds [229863.270042] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 19 previous similar messages [229897.260659] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [229897.272671] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 65 previous similar messages [229942.261803] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 0 seconds [229942.272066] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 6 previous similar messages [229982.756781] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.211@o2ib7: -125 [230102.265838] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 0 seconds [230102.276101] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 15 previous similar messages [230277.832236] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.211@o2ib7: -125 [230286.851717] LNetError: 101205:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [230286.863201] LNetError: 101205:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 57 previous similar messages [230402.273401] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 0 seconds [230402.283661] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 28 previous similar messages [230408.459521] Lustre: 67829:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573062770/real 1573062770] req@ffff9a5b64f13a80 x1649330475672320/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573062777 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [230408.486952] Lustre: 67829:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [230415.459701] Lustre: 67751:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573062777/real 1573062777] req@ffff9a609c390d80 x1649330475672336/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573062784 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [230422.486893] Lustre: 67751:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573062784/real 1573062784] req@ffff9a609c390d80 x1649330475672336/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573062791 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [230422.514233] Lustre: 67751:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [230429.497059] Lustre: 67829:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573062791/real 1573062791] req@ffff9a5b64f13a80 x1649330475672320/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573062798 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [230429.524402] Lustre: 67829:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [230436.524253] Lustre: 67751:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573062798/real 1573062798] req@ffff9a609c390d80 x1649330475672336/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573062805 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [230436.551642] Lustre: 67751:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [230450.535588] Lustre: 67829:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573062812/real 1573062812] req@ffff9a5b64f13a80 x1649330475672320/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573062819 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [230450.562956] Lustre: 67829:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [230471.564123] Lustre: 67751:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573062833/real 1573062833] req@ffff9a609c390d80 x1649330475672336/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573062840 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [230471.591469] Lustre: 67751:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [230498.275871] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [230498.287870] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages [230498.307616] LustreError: 67829:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) returned error from glimpse AST (req@ffff9a5b64f13a80 x1649330475672320 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6031620b40/0x746adaf8bec89785 lrc: 4/0,0 mode: PW/PW res: [0x2c003362d:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0x317571bc1eec3bfd expref: 41 pid: 67757 timeout: 0 lvb_type: 0 [230498.307621] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock blocking callback time out: rc -107 [230498.307648] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 4s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a81a13e8900/0x746adaf8bec8f277 lrc: 3/0,0 mode: PR/PR res: [0x2c003140d:0x16af:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.8.23.14@o2ib6 remote: 0x317571bc1eec3ca5 expref: 42 pid: 67941 timeout: 0 lvb_type: 0 [230498.399877] LustreError: 67829:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 2 previous similar messages [230528.076431] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [230528.086868] Lustre: Skipped 1 previous similar message [230577.903798] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.211@o2ib7: -125 [230780.443894] Lustre: fir-MDT0002: haven't heard from client aa597357-5449-554f-bcd5-9568259fcfe8 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a77238c8800, cur 1573063149 expire 1573062999 last 1573062922 [230829.640222] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [230879.989362] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.211@o2ib7: -125 [230892.008263] LNetError: 102433:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [230892.019752] LNetError: 102433:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 63 previous similar messages [231008.288584] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 1 seconds [231008.298844] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 59 previous similar messages [231102.290943] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [231102.302946] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 116 previous similar messages [231104.706691] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [231107.452047] Lustre: fir-MDT0002: haven't heard from client 59289ecc-4302-1945-63ff-8fb8abd4c3e7 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a53ad3c8400, cur 1573063476 expire 1573063326 last 1573063249 [231558.147285] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.201@o2ib7: -125 [231558.159372] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 1 previous similar message [231617.303796] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [231617.314059] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 48 previous similar messages [231702.305898] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [231702.317910] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 73 previous similar messages [231847.957428] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573064209/real 1573064209] req@ffff9a51a7caad00 x1649330496804208/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573064216 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [231847.984769] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [231854.994605] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573064216/real 1573064216] req@ffff9a51a7caad00 x1649330496804208/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573064223 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [231869.021953] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573064230/real 1573064230] req@ffff9a51a7caad00 x1649330496804208/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573064237 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [231869.049296] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [231890.059477] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573064251/real 1573064251] req@ffff9a51a7caad00 x1649330496804208/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573064258 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [231890.086818] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [231925.098344] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573064286/real 1573064286] req@ffff9a51a7caad00 x1649330496804208/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573064293 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [231925.125685] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [231995.141035] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573064356/real 1573064356] req@ffff9a51a7caad00 x1649330496804208/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573064363 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [231995.168373] Lustre: 67738:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [232023.180018] LustreError: 67738:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) returned error from glimpse AST (req@ffff9a51a7caad00 x1649330496804208 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a73555e0d80/0x746adaf8d3beb893 lrc: 4/0,0 mode: PW/PW res: [0x2c0033632:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0xfdc05c4803deac1 expref: 41 pid: 67219 timeout: 0 lvb_type: 0 [232023.222276] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 [232023.234792] LustreError: Skipped 2 previous similar messages [232023.240569] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 382s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a73555e0d80/0x746adaf8d3beb893 lrc: 3/0,0 mode: PW/PW res: [0x2c0033632:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0xfdc05c4803deac1 expref: 42 pid: 67219 timeout: 0 lvb_type: 0 [232025.832885] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [232223.318692] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [232223.328954] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 58 previous similar messages [232308.320786] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [232308.332787] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 57 previous similar messages [232404.125157] Lustre: 67678:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573064765/real 1573064765] req@ffff9a71b9503a80 x1649330507887984/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573064772 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [232404.152497] Lustre: 67678:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [232454.389397] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.201@o2ib7: -125 [232454.401499] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 2 previous similar messages [232466.413721] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.15@o2ib4) [232482.888041] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.14@o2ib4) [232530.169776] LustreError: 67678:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) returned error from glimpse AST (req@ffff9a71b9503a80 x1649330507887984 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6fa27b0b40/0x746adaf8ddd8bc84 lrc: 4/0,0 mode: PW/PW res: [0x2c0033633:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0x48399eddeb006006 expref: 41 pid: 67942 timeout: 0 lvb_type: 0 [232530.212114] LustreError: 67678:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 2 previous similar messages [232530.222293] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 [232530.234848] LustreError: Skipped 2 previous similar messages [232530.240622] LustreError: 67208:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 282s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a6fa27b0b40/0x746adaf8ddd8bc84 lrc: 3/0,0 mode: PW/PW res: [0x2c0033633:0x3:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0x48399eddeb006006 expref: 42 pid: 67942 timeout: 0 lvb_type: 0 [232544.572969] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [232607.453336] LNetError: 102267:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [232607.464818] LNetError: 102267:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 30 previous similar messages [232682.469298] LNetError: 102588:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [232682.480783] LNetError: 102588:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 6 previous similar messages [232841.496947] LNetError: 102588:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [232841.508430] LNetError: 102588:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 16 previous similar messages [232921.504021] LNetError: 102224:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [232921.516106] LNetError: 102224:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 57 previous similar messages [233141.547583] LNetError: 103329:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [233141.559084] LNetError: 103329:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 29 previous similar messages [233270.505920] Lustre: fir-MDT0002: haven't heard from client c2a9fd2e-2002-17f5-702d-f38b0db22b38 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f41a62000, cur 1573065639 expire 1573065489 last 1573065412 [233329.198315] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [233531.609833] LNetError: 103329:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [233531.621919] LNetError: 103329:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 59 previous similar messages [233746.646906] LNetError: 101724:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [233746.658385] LNetError: 101724:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 58 previous similar messages [233907.361149] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [233907.371410] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 35 previous similar messages [233966.664662] LNetError: 67102:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.201@o2ib7: -125 [233987.363213] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [233987.373468] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 8 previous similar messages [233999.124747] LNet: 102676:0:(o2iblnd_cb.c:2601:kiblnd_passive_connect()) Conn stale 10.0.10.201@o2ib7 version 12/12 incarnation 1573066367011086/1573066367011086 [233999.363490] LNet: 67085:0:(o2iblnd_cb.c:1510:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.201@o2ib7: connected [234468.346305] Lustre: fir-MDT0002: Connection restored to 87024fcd-e9de-4931-86b0-b8038d2cef0f (at 10.8.30.15@o2ib6) [235301.558337] Lustre: fir-MDT0002: haven't heard from client 6d634815-692b-76c7-6e76-0a9d3a981619 (at 10.9.106.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8c98b68800, cur 1573067670 expire 1573067520 last 1573067443 [235369.560502] Lustre: fir-MDT0002: Connection restored to 5413846e-ce31-ae3b-cf16-0a735f78aa9a (at 10.9.113.12@o2ib4) [235993.582037] list passed to list_sort() too long for efficiency [235996.414884] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [235996.425147] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.202@o2ib7 (5): c: 5, oc: 0, rc: 8 [235997.414904] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [235997.425158] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 1 previous similar message [235997.435325] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.102@o2ib7 (6): c: 2, oc: 0, rc: 8 [235997.447397] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 1 previous similar message [235997.576154] Lustre: fir-MDT0002: Client 4049a6d6-6a6a-a98d-f3e8-73a37df31486 (at 10.8.21.35@o2ib6) reconnecting [235997.586330] Lustre: Skipped 1 previous similar message [235997.591604] Lustre: fir-MDT0002: Connection restored to 4049a6d6-6a6a-a98d-f3e8-73a37df31486 (at 10.8.21.35@o2ib6) [235997.749920] LustreError: 92901:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a797f818850 x1649416795595232/t0(0) o4->ac2f8831-eb52-3a99-b876-6ffdc2f892de@10.9.106.16@o2ib4:222/0 lens 488/448 e 1 to 0 dl 1573068392 ref 1 fl Interpret:/0/0 rc 0/0 [235997.774170] Lustre: fir-MDT0002: Bulk IO write error with ac2f8831-eb52-3a99-b876-6ffdc2f892de (at 10.9.106.16@o2ib4), client will retry: rc = -110 [235998.220638] Lustre: fir-MDT0002: Connection restored to 2beb25c9-933a-3e97-3fab-c8f1d7e32c23 (at 10.8.21.12@o2ib6) [235998.231073] Lustre: Skipped 3 previous similar messages [235998.232935] Lustre: 67953:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1573068359/real 0] req@ffff9a7760e0de80 x1649330524350496/t0(0) o104->fir-MDT0002@10.9.105.3@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573068366 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [235998.232938] Lustre: 67953:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 18 previous similar messages [235998.557961] Lustre: fir-OST0018-osc-MDT0002: Connection to fir-OST0018 (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [235998.657941] LustreError: 92660:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a8b2db6b850 x1649416795594896/t0(0) o4->ac2f8831-eb52-3a99-b876-6ffdc2f892de@10.9.106.16@o2ib4:234/0 lens 488/448 e 0 to 0 dl 1573068404 ref 1 fl Interpret:/0/0 rc 0/0 [235998.682152] LustreError: 92660:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 3 previous similar messages [235999.230587] Lustre: fir-MDT0002: Connection restored to e6122ed7-ba94-2cf0-06a2-d452873eaed0 (at 10.9.105.3@o2ib4) [235999.241020] Lustre: Skipped 22 previous similar messages [235999.288961] Lustre: fir-OST001a-osc-MDT0002: Connection to fir-OST001a (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [235999.305111] Lustre: Skipped 1 previous similar message [235999.351994] Lustre: fir-MDT0002: Bulk IO write error with f0a8ec9b-fbf5-a8d2-cba4-506dafb70319 (at 10.9.110.5@o2ib4), client will retry: rc = -110 [235999.365213] Lustre: Skipped 5 previous similar messages [235999.414962] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 2418 seconds [235999.425484] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [235999.437472] LNetError: 67085:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 44 previous similar messages [236000.414995] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [236000.425254] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 1 previous similar message [236000.435422] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.209@o2ib7 (5): c: 7, oc: 0, rc: 8 [236000.447494] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 1 previous similar message [236001.038015] Lustre: fir-OST0022-osc-MDT0002: Connection to fir-OST0022 (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [236001.054169] Lustre: Skipped 3 previous similar messages [236001.599942] Lustre: fir-MDT0002: Client 79da3557-0dd8-94ba-46e3-b3332f203b06 (at 10.9.110.26@o2ib4) reconnecting [236001.610199] Lustre: Skipped 52 previous similar messages [236001.615623] Lustre: fir-MDT0002: Connection restored to 79da3557-0dd8-94ba-46e3-b3332f203b06 (at 10.9.110.26@o2ib4) [236001.626139] Lustre: Skipped 19 previous similar messages [236001.946050] LNetError: 67102:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.209@o2ib7 added to recovery queue. Health = 900 [236002.415201] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [236002.425459] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [236002.435722] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.104@o2ib7 (8): c: 2, oc: 0, rc: 8 [236002.447825] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 6 previous similar messages [236003.686071] Lustre: fir-OST0028-osc-MDT0002: Connection to fir-OST0028 (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [236003.702219] Lustre: Skipped 20 previous similar messages [236004.739093] LustreError: 92910:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a9106282050 x1649416795594896/t0(0) o4->ac2f8831-eb52-3a99-b876-6ffdc2f892de@10.9.106.16@o2ib4:239/0 lens 488/448 e 0 to 0 dl 1573068409 ref 1 fl Interpret:/2/0 rc 0/0 [236004.763312] LustreError: 92910:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 2 previous similar messages [236004.772844] Lustre: fir-MDT0002: Bulk IO write error with ac2f8831-eb52-3a99-b876-6ffdc2f892de (at 10.9.106.16@o2ib4), client will retry: rc = -110 [236005.624310] Lustre: fir-MDT0002: Connection restored to 09a03217-f2a1-2632-097f-38339f6cbc7c (at 10.8.22.1@o2ib6) [236005.634667] Lustre: Skipped 63 previous similar messages [236007.694170] Lustre: fir-OST0043-osc-MDT0002: Connection to fir-OST0043 (at 10.0.10.112@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [236007.710320] Lustre: Skipped 17 previous similar messages [236009.709252] Lustre: fir-MDT0002: Client bdb2a993-354c-ddce-bf9d-5960b01c7975 (at 10.8.23.13@o2ib6) reconnecting [236009.719429] Lustre: Skipped 124 previous similar messages [236013.664904] Lustre: fir-MDT0002: Connection restored to (at 10.8.22.10@o2ib6) [236013.672228] Lustre: Skipped 105 previous similar messages [236015.116346] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [mdt_rdpg00_011:70180] [236015.124342] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [mdt_rdpg01_016:70178] [236015.124386] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.124401] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.124404] CPU: 1 PID: 70178 Comm: mdt_rdpg01_016 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.124405] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.124407] task: ffff9a91a6536180 ti: ffff9a6fb7228000 task.ti: ffff9a6fb7228000 [236015.124416] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.124417] RSP: 0018:ffff9a6fb722b6c0 EFLAGS: 00000246 [236015.124419] RAX: 0000000000000000 RBX: ffff9a83ed2798f0 RCX: 0000000000090000 [236015.124420] RDX: ffff9a91ff69b780 RSI: 0000000001590101 RDI: ffff9a7b0c800c80 [236015.124420] RBP: ffff9a6fb722b6c0 R08: ffff9a71bf61b780 R09: 0000000000000000 [236015.124421] R10: ffff9a71bf61f140 R11: fffff4b4cc18fc00 R12: ffff9a6fb722b690 [236015.124422] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.124423] FS: 00007f32e01b3700(0000) GS:ffff9a71bf600000(0000) knlGS:0000000000000000 [236015.124424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.124425] CR2: 00007f32e0224000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.124426] Call Trace: [236015.124433] [] queued_spin_lock_slowpath+0xb/0xf [236015.124436] [] _raw_spin_lock+0x20/0x30 [236015.124451] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.124459] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.124472] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.124476] [] ? wake_up_bit+0x25/0x30 [236015.124487] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.124497] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.124501] [] ? __find_get_block+0xbc/0x120 [236015.124509] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.124516] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.124524] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.124531] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.124537] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.124547] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.124550] [] ? generic_getxattr+0x52/0x70 [236015.124557] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.124564] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.124576] [] lod_it_load+0x27/0x90 [lod] [236015.124607] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.124620] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.124629] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.124644] [] mdt_readpage+0x63a/0x880 [mdt] [236015.124700] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.124739] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.124747] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.124784] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.124820] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.124824] [] ? __wake_up+0x44/0x50 [236015.124859] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.124895] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.124897] [] kthread+0xd1/0xe0 [236015.124898] [] ? insert_kthread_work+0x40/0x40 [236015.124902] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.124904] [] ? insert_kthread_work+0x40/0x40 [236015.124924] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.150342] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [mdt_io00_009:67981] [236015.150392] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.150408] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.150412] CPU: 4 PID: 67981 Comm: mdt_io00_009 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.150412] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.150414] task: ffff9a60f5a52080 ti: ffff9a60c3ed0000 task.ti: ffff9a60c3ed0000 [236015.150423] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.150424] RSP: 0018:ffff9a60c3ed3798 EFLAGS: 00000246 [236015.150425] RAX: 0000000000000000 RBX: ffff9a61b0df2030 RCX: 0000000000210000 [236015.150426] RDX: ffff9a61bee9b780 RSI: 0000000000410101 RDI: ffff9a7b0c800c80 [236015.150427] RBP: ffff9a60c3ed3798 R08: ffff9a61bee5b780 R09: 0000000000000000 [236015.150427] R10: ffff9a61bee5f140 R11: fffff4b46a5d4000 R12: ffffffffc159c71b [236015.150428] R13: ffff9a60c3ed3758 R14: 000000000000081f R15: ffffffffc16c0ea0 [236015.150430] FS: 00007f3c307ae880(0000) GS:ffff9a61bee40000(0000) knlGS:0000000000000000 [236015.150431] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.150432] CR2: 00007fa5b4119028 CR3: 000000302ffc0000 CR4: 00000000003407e0 [236015.150433] Call Trace: [236015.150439] [] queued_spin_lock_slowpath+0xb/0xf [236015.150443] [] _raw_spin_lock+0x20/0x30 [236015.150459] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.150468] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.150472] [] ? account_entity_dequeue+0xae/0xd0 [236015.150475] [] ? ktime_get_ts64+0x52/0xf0 [236015.150487] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.150499] [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] [236015.150532] [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [236015.150546] [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] [236015.150557] [] osd_read_prep+0x2de/0x400 [osd_ldiskfs] [236015.150580] [] mdt_obd_preprw+0xd9b/0x10a0 [mdt] [236015.150639] [] ? null_alloc_rs+0x16d/0x340 [ptlrpc] [236015.150683] [] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [236015.150703] [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [236015.150741] [] ? null_alloc_rs+0x186/0x340 [ptlrpc] [236015.150777] [] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [236015.150812] [] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [236015.150845] [] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [236015.150885] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.150923] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.150929] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.150965] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.150999] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.151003] [] ? __wake_up+0x44/0x50 [236015.151037] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.151072] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.151075] [] kthread+0xd1/0xe0 [236015.151077] [] ? insert_kthread_work+0x40/0x40 [236015.151080] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.151081] [] ? insert_kthread_work+0x40/0x40 [236015.151101] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.159345] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [mdt_rdpg01_017:70182] [236015.159394] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.159409] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.159413] CPU: 5 PID: 70182 Comm: mdt_rdpg01_017 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.159414] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.159415] task: ffff9a71b96d2080 ti: ffff9a6e85be8000 task.ti: ffff9a6e85be8000 [236015.159424] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x1ce/0x200 [236015.159425] RSP: 0018:ffff9a6e85beb6c0 EFLAGS: 00000202 [236015.159426] RAX: 0000000000000001 RBX: ffff9a71bf65ab80 RCX: 0000000000000001 [236015.159427] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff9a7b0c800c80 [236015.159428] RBP: ffff9a6e85beb6c0 R08: 0000000000000101 R09: ffffffffc156f68a [236015.159428] R10: ffff9a71bf65f140 R11: fffff4b484789200 R12: ffff9a71b96d2718 [236015.159429] R13: 0000000185beb638 R14: ffff9a71bf640000 R15: ffffffffb3e2a59e [236015.159431] FS: 00007f57196a8700(0000) GS:ffff9a71bf640000(0000) knlGS:0000000000000000 [236015.159432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.159433] CR2: 000000000208adf8 CR3: 00000020371ac000 CR4: 00000000003407e0 [236015.159434] Call Trace: [236015.159440] [] queued_spin_lock_slowpath+0xb/0xf [236015.159444] [] _raw_spin_lock+0x20/0x30 [236015.159458] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.159467] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.159477] [] ? dm_old_request_fn+0xcc/0x210 [dm_mod] [236015.159481] [] ? iova_rcache_get+0xba/0x140 [236015.159492] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.159497] [] ? __brelse+0x3d/0x50 [236015.159507] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.159517] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.159519] [] ? __find_get_block+0xbc/0x120 [236015.159527] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.159534] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.159542] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.159546] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.159551] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.159562] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.159565] [] ? generic_getxattr+0x52/0x70 [236015.159572] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.159579] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.159591] [] lod_it_load+0x27/0x90 [lod] [236015.159621] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.159635] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.159644] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.159660] [] mdt_readpage+0x63a/0x880 [mdt] [236015.159714] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.159753] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.159760] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.159797] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.159832] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.159838] [] ? __wake_up+0x44/0x50 [236015.159873] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.159907] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.159912] [] kthread+0xd1/0xe0 [236015.159913] [] ? insert_kthread_work+0x40/0x40 [236015.159917] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.159919] [] ? insert_kthread_work+0x40/0x40 [236015.159939] Code: 37 81 fe 00 01 00 00 74 f4 e9 93 fe ff ff 0f 1f 80 00 00 00 00 83 fa 01 75 11 0f 1f 00 e9 68 fe ff ff 0f 1f 00 85 c0 74 0c f3 90 <8b> 07 0f b6 c0 83 f8 03 75 f0 b8 01 00 00 00 66 89 07 5d c3 66 [236015.167347] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [mdt_rdpg02_008:68142] [236015.167396] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.167413] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.167417] CPU: 6 PID: 68142 Comm: mdt_rdpg02_008 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.167418] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.167420] task: ffff9a7d76bb1040 ti: ffff9a71a7ae4000 task.ti: ffff9a71a7ae4000 [236015.167428] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.167430] RSP: 0018:ffff9a71a7ae7678 EFLAGS: 00000246 [236015.167431] RAX: 0000000000000000 RBX: 000004fb00000002 RCX: 0000000000310000 [236015.167432] RDX: ffff9a81bf75b780 RSI: 0000000000b10101 RDI: ffff9a7b0c800c80 [236015.167433] RBP: ffff9a71a7ae7678 R08: ffff9a81bf65b780 R09: 0000000000000000 [236015.167433] R10: ffff9a81bf65f140 R11: fffff4b4fa57d600 R12: ffff9a603e7eb000 [236015.167434] R13: ffff9a61b5290000 R14: ffff9a71a7ae7908 R15: ffff9a603e7eb328 [236015.167436] FS: 00007f418058c700(0000) GS:ffff9a81bf640000(0000) knlGS:0000000000000000 [236015.167437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.167437] CR2: 00007f41828ed000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.167439] Call Trace: [236015.167445] [] queued_spin_lock_slowpath+0xb/0xf [236015.167449] [] _raw_spin_lock+0x20/0x30 [236015.167465] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.167473] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.167487] [] ? osd_write+0x15b/0x5c0 [osd_ldiskfs] [236015.167499] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.167515] [] ? lod_sub_write+0x1d0/0x410 [lod] [236015.167579] [] ? tgt_last_rcvd_update+0x6be/0xc90 [ptlrpc] [236015.167590] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.167601] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.167608] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.167616] [] dx_probe+0xa2/0xa20 [ldiskfs] [236015.167626] [] ? __ldiskfs_get_inode_loc+0xe3/0x3c0 [ldiskfs] [236015.167632] [] ? ldiskfs_xattr_find_entry+0x9f/0x130 [ldiskfs] [236015.167639] [] ldiskfs_htree_fill_tree+0x199/0x2f0 [ldiskfs] [236015.167646] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.167653] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.167657] [] ? generic_getxattr+0x52/0x70 [236015.167665] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.167672] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.167680] [] lod_it_load+0x27/0x90 [lod] [236015.167715] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.167729] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.167737] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.167750] [] mdt_readpage+0x63a/0x880 [mdt] [236015.167794] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.167833] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.167843] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.167879] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.167915] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.167919] [] ? __wake_up+0x44/0x50 [236015.167954] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.167956] [] ? __schedule+0x42a/0x860 [236015.167991] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.167994] [] kthread+0xd1/0xe0 [236015.167996] [] ? insert_kthread_work+0x40/0x40 [236015.167999] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.168000] [] ? insert_kthread_work+0x40/0x40 [236015.168021] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.183345] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [mdt_rdpg00_004:68095] [236015.183391] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.183406] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.183410] CPU: 8 PID: 68095 Comm: mdt_rdpg00_004 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.183411] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.183412] task: ffff9a61a69bc100 ti: ffff9a602c23c000 task.ti: ffff9a602c23c000 [236015.183420] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [236015.183421] RSP: 0018:ffff9a602c23f6c0 EFLAGS: 00000246 [236015.183422] RAX: 0000000000000000 RBX: ffff9a83ed279680 RCX: 0000000000410000 [236015.183423] RDX: ffff9a81bf85b780 RSI: 0000000001310101 RDI: ffff9a7b0c800c80 [236015.183424] RBP: ffff9a602c23f6c0 R08: ffff9a61bee9b780 R09: 0000000000000000 [236015.183425] R10: ffff9a61bee9f140 R11: fffff4b46dba0400 R12: ffff9a602c23f690 [236015.183426] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.183427] FS: 00007fe2c4310900(0000) GS:ffff9a61bee80000(0000) knlGS:0000000000000000 [236015.183428] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.183429] CR2: 00007fa75c3e3000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.183430] Call Trace: [236015.183436] [] queued_spin_lock_slowpath+0xb/0xf [236015.183440] [] _raw_spin_lock+0x20/0x30 [236015.183457] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.183465] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.183480] [] ? osd_write+0x15b/0x5c0 [osd_ldiskfs] [236015.183492] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.183497] [] ? __brelse+0x3d/0x50 [236015.183508] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.183518] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.183520] [] ? __find_get_block+0xbc/0x120 [236015.183527] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.183535] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.183542] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.183548] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.183555] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.183563] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.183566] [] ? generic_getxattr+0x52/0x70 [236015.183574] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.183581] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.183594] [] lod_it_load+0x27/0x90 [lod] [236015.183631] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.183645] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.183654] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.183669] [] mdt_readpage+0x63a/0x880 [mdt] [236015.183734] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.183774] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.183783] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.183819] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.183855] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.183860] [] ? __wake_up+0x44/0x50 [236015.183896] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.183931] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.183935] [] kthread+0xd1/0xe0 [236015.183937] [] ? insert_kthread_work+0x40/0x40 [236015.183940] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.183941] [] ? insert_kthread_work+0x40/0x40 [236015.183961] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [236015.209346] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [ldlm_cn03_007:72337] [236015.209387] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.209401] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.209404] CPU: 11 PID: 72337 Comm: ldlm_cn03_007 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.209405] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.209406] task: ffff9a81b2154100 ti: ffff9a61a179c000 task.ti: ffff9a61a179c000 [236015.209412] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x128/0x200 [236015.209413] RSP: 0018:ffff9a61a179f880 EFLAGS: 00000246 [236015.209414] RAX: 0000000000000000 RBX: ffff9a6ab08ad6a8 RCX: 0000000000590000 [236015.209415] RDX: ffff9a71bf61b780 RSI: 0000000000090101 RDI: ffff9a7b0c800c80 [236015.209415] RBP: ffff9a61a179f880 R08: ffff9a91ff49b780 R09: 0000000000000000 [236015.209416] R10: ffff9a91ff49f0c0 R11: fffff4b5036965c0 R12: 0000000000000000 [236015.209417] R13: ffffffffc159c26c R14: ffff9a61a179f810 R15: ffff9a61a179f8c0 [236015.209418] FS: 00007f417d586700(0000) GS:ffff9a91ff480000(0000) knlGS:0000000000000000 [236015.209419] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.209420] CR2: 00007f59c64f3080 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.209421] Call Trace: [236015.209425] [] queued_spin_lock_slowpath+0xb/0xf [236015.209428] [] _raw_spin_lock+0x20/0x30 [236015.209443] [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] [236015.209453] [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] [236015.209462] [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] [236015.209473] [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] [236015.209480] [] ? ldiskfs_orphan_del+0x171/0x240 [ldiskfs] [236015.209489] [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] [236015.209498] [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] [236015.209506] [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] [236015.209509] [] evict+0xb4/0x180 [236015.209510] [] iput+0xfc/0x190 [236015.209519] [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] [236015.209552] [] lu_object_free.isra.32+0x68/0x170 [obdclass] [236015.209562] [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] [236015.209582] [] lu_object_put+0xc5/0x3d0 [obdclass] [236015.209605] [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] [236015.209617] [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] [236015.209655] [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] [236015.209691] [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] [236015.209693] [] ? cpumask_next_and+0x35/0x50 [236015.209696] [] ? kmem_cache_alloc_node_trace+0x11d/0x210 [236015.209715] [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [236015.209741] [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] [236015.209771] [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] [236015.209798] [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [236015.209824] [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] [236015.209851] [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] [236015.209880] [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] [236015.209909] [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] [236015.209938] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [236015.209974] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.210007] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.210009] [] ? __wake_up+0x44/0x50 [236015.210042] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.210044] [] ? __schedule+0x42a/0x860 [236015.210076] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.210078] [] kthread+0xd1/0xe0 [236015.210080] [] ? insert_kthread_work+0x40/0x40 [236015.210082] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.210084] [] ? insert_kthread_work+0x40/0x40 [236015.210101] Code: 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 85 c0 <74> f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 85 c0 [236015.218343] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [mdt_rdpg00_003:68077] [236015.218372] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.218382] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.218385] CPU: 12 PID: 68077 Comm: mdt_rdpg00_003 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.218385] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.218387] task: ffff9a52e9611040 ti: ffff9a613e618000 task.ti: ffff9a613e618000 [236015.218391] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.218392] RSP: 0018:ffff9a613e61b6c0 EFLAGS: 00000246 [236015.218393] RAX: 0000000000000000 RBX: ffff9a613e61b718 RCX: 0000000000610000 [236015.218394] RDX: ffff9a81bf65b780 RSI: 0000000000310101 RDI: ffff9a7b0c800c80 [236015.218394] RBP: ffff9a613e61b6c0 R08: ffff9a61beedb780 R09: 0000000000000000 [236015.218395] R10: ffff9a61beedf140 R11: fffff4b45d3c5000 R12: 00000000000009a8 [236015.218396] R13: 0000000000000000 R14: ffffffffc159c71b R15: ffff9a613e61b678 [236015.218397] FS: 00007f3c307ae880(0000) GS:ffff9a61beec0000(0000) knlGS:0000000000000000 [236015.218398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.218399] CR2: 00007f7292311764 CR3: 000000302ffc0000 CR4: 00000000003407e0 [236015.218400] Call Trace: [236015.218403] [] queued_spin_lock_slowpath+0xb/0xf [236015.218405] [] _raw_spin_lock+0x20/0x30 [236015.218418] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.218426] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.218438] [] ? osd_ldiskfs_write_record+0x346/0x410 [osd_ldiskfs] [236015.218450] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.218460] [] ? osd_write+0x15b/0x5c0 [osd_ldiskfs] [236015.218471] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.218481] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.218485] [] ? __find_get_block+0xbc/0x120 [236015.218493] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.218500] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.218508] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.218514] [] ? ldiskfs_readdir+0x799/0x850 [ldiskfs] [236015.218518] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.218524] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.218532] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.218534] [] ? generic_getxattr+0x52/0x70 [236015.218541] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.218548] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.218561] [] lod_it_load+0x27/0x90 [lod] [236015.218585] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.218599] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.218607] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.218618] [] mdt_readpage+0x63a/0x880 [mdt] [236015.218661] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.218701] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.218707] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.218744] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.218780] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.218782] [] ? __wake_up+0x44/0x50 [236015.218817] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.218820] [] ? __schedule+0x42a/0x860 [236015.218855] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.218856] [] kthread+0xd1/0xe0 [236015.218858] [] ? insert_kthread_work+0x40/0x40 [236015.218860] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.218862] [] ? insert_kthread_work+0x40/0x40 [236015.218882] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.228345] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 23s! [mdt_rdpg01_010:68127] [236015.228384] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.228395] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.228399] CPU: 13 PID: 68127 Comm: mdt_rdpg01_010 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.228399] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.228401] task: ffff9a71a1bc8000 ti: ffff9a71b860c000 task.ti: ffff9a71b860c000 [236015.228407] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.228408] RSP: 0018:ffff9a71b860f6c0 EFLAGS: 00000246 [236015.228409] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000690000 [236015.228410] RDX: ffff9a61bef9b780 RSI: 0000000000c10101 RDI: ffff9a7b0c800c80 [236015.228411] RBP: ffff9a71b860f6c0 R08: ffff9a71bf6db780 R09: 0000000000000000 [236015.228412] R10: ffff9a71bf6df140 R11: fffff4b484780a00 R12: ffff9a71fff5a000 [236015.228412] R13: 0000000000000000 R14: ffff9a71bf840000 R15: ffffffff00000141 [236015.228414] FS: 00007f32e0171700(0000) GS:ffff9a71bf6c0000(0000) knlGS:0000000000000000 [236015.228414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.228415] CR2: 00007f1077319000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.228416] Call Trace: [236015.228420] [] queued_spin_lock_slowpath+0xb/0xf [236015.228423] [] _raw_spin_lock+0x20/0x30 [236015.228436] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.228444] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.228456] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.228459] [] ? __brelse+0x3d/0x50 [236015.228469] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.228479] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.228481] [] ? __find_get_block+0xbc/0x120 [236015.228488] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.228496] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.228503] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.228506] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.228512] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.228521] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.228523] [] ? generic_getxattr+0x52/0x70 [236015.228530] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.228538] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.228548] [] lod_it_load+0x27/0x90 [lod] [236015.228577] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.228587] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.228595] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.228609] [] mdt_readpage+0x63a/0x880 [mdt] [236015.228654] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.228692] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.228699] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.228735] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.228770] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.228772] [] ? __wake_up+0x44/0x50 [236015.228807] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.228809] [] ? __schedule+0x42a/0x860 [236015.228843] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.228845] [] kthread+0xd1/0xe0 [236015.228847] [] ? insert_kthread_work+0x40/0x40 [236015.228849] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.228850] [] ? insert_kthread_work+0x40/0x40 [236015.228869] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.253343] NMI watchdog: BUG: soft lockup - CPU#16 stuck for 23s! [mdt_rdpg00_005:68125] [236015.253372] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.253381] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.253384] CPU: 16 PID: 68125 Comm: mdt_rdpg00_005 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.253385] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.253386] task: ffff9a61be08e180 ti: ffff9a60e2368000 task.ti: ffff9a60e2368000 [236015.253390] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [236015.253391] RSP: 0018:ffff9a60e236b6c0 EFLAGS: 00000246 [236015.253392] RAX: 0000000000000000 RBX: ffff9a83ed2797b8 RCX: 0000000000810000 [236015.253393] RDX: ffff9a81bf7db780 RSI: 0000000000f10101 RDI: ffff9a7b0c800c80 [236015.253394] RBP: ffff9a60e236b6c0 R08: ffff9a61bef1b780 R09: 0000000000000000 [236015.253394] R10: ffff9a61bef1f140 R11: fffff4b444061200 R12: ffff9a60e236b690 [236015.253395] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.253396] FS: 00007f80cc816740(0000) GS:ffff9a61bef00000(0000) knlGS:0000000000000000 [236015.253397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.253398] CR2: 00007f80c58d7140 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.253399] Call Trace: [236015.253402] [] queued_spin_lock_slowpath+0xb/0xf [236015.253404] [] _raw_spin_lock+0x20/0x30 [236015.253417] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.253425] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.253436] [] ? osd_write+0x15b/0x5c0 [osd_ldiskfs] [236015.253448] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.253450] [] ? __brelse+0x3d/0x50 [236015.253460] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.253470] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.253472] [] ? __find_get_block+0xbc/0x120 [236015.253480] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.253487] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.253495] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.253497] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.253503] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.253510] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.253512] [] ? generic_getxattr+0x52/0x70 [236015.253519] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.253526] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.253535] [] lod_it_load+0x27/0x90 [lod] [236015.253557] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.253567] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.253575] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.253586] [] mdt_readpage+0x63a/0x880 [mdt] [236015.253628] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.253667] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.253673] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.253709] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.253745] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.253747] [] ? __wake_up+0x44/0x50 [236015.253781] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.253816] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.253817] [] kthread+0xd1/0xe0 [236015.253819] [] ? insert_kthread_work+0x40/0x40 [236015.253821] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.253823] [] ? insert_kthread_work+0x40/0x40 [236015.253842] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [236015.262345] NMI watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [mdt_rdpg01_009:68124] [236015.262375] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.262384] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.262387] CPU: 17 PID: 68124 Comm: mdt_rdpg01_009 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.262388] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.262389] task: ffff9a61be08d140 ti: ffff9a6036bd8000 task.ti: ffff9a6036bd8000 [236015.262394] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.262395] RSP: 0018:ffff9a6036bdb6c0 EFLAGS: 00000246 [236015.262396] RAX: 0000000000000000 RBX: ffff9a83ed279888 RCX: 0000000000890000 [236015.262397] RDX: ffff9a71bf75b780 RSI: 0000000000a90101 RDI: ffff9a7b0c800c80 [236015.262398] RBP: ffff9a6036bdb6c0 R08: ffff9a71bf71b780 R09: 0000000000000000 [236015.262398] R10: ffff9a71bf71f140 R11: fffff4b4912db000 R12: ffff9a6036bdb690 [236015.262399] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.262401] FS: 00007f32e0150700(0000) GS:ffff9a71bf700000(0000) knlGS:0000000000000000 [236015.262401] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.262402] CR2: 00007f32e0224000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.262403] Call Trace: [236015.262406] [] queued_spin_lock_slowpath+0xb/0xf [236015.262409] [] _raw_spin_lock+0x20/0x30 [236015.262421] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.262430] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.262442] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.262444] [] ? __brelse+0x3d/0x50 [236015.262455] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.262465] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.262467] [] ? __find_get_block+0xbc/0x120 [236015.262474] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.262482] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.262489] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.262492] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.262498] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.262506] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.262508] [] ? generic_getxattr+0x52/0x70 [236015.262515] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.262522] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.262531] [] lod_it_load+0x27/0x90 [lod] [236015.262554] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.262564] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.262572] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.262584] [] mdt_readpage+0x63a/0x880 [mdt] [236015.262627] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.262666] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.262672] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.262708] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.262743] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.262746] [] ? __wake_up+0x44/0x50 [236015.262781] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.262783] [] ? __schedule+0x42a/0x860 [236015.262817] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.262819] [] kthread+0xd1/0xe0 [236015.262821] [] ? insert_kthread_work+0x40/0x40 [236015.262823] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.262825] [] ? insert_kthread_work+0x40/0x40 [236015.262844] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.288345] NMI watchdog: BUG: soft lockup - CPU#20 stuck for 22s! [mdt_rdpg00_015:70188] [236015.288370] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.288379] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.288381] CPU: 20 PID: 70188 Comm: mdt_rdpg00_015 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.288382] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.288383] task: ffff9a8e81a72080 ti: ffff9a5fd0780000 task.ti: ffff9a5fd0780000 [236015.288386] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.288387] RSP: 0018:ffff9a5fd07836c0 EFLAGS: 00000246 [236015.288388] RAX: 0000000000000000 RBX: ffff9a5fd0783718 RCX: 0000000000a10000 [236015.288389] RDX: ffff9a61befdb780 RSI: 0000000000e10101 RDI: ffff9a7b0c800c80 [236015.288389] RBP: ffff9a5fd07836c0 R08: ffff9a61bef5b780 R09: 0000000000000000 [236015.288390] R10: ffff9a61bef5f140 R11: fffff4b460883200 R12: 00000000000009a8 [236015.288391] R13: 0000000000000000 R14: ffffffffc159c71b R15: ffff9a5fd0783678 [236015.288392] FS: 00007f3c307ae880(0000) GS:ffff9a61bef40000(0000) knlGS:0000000000000000 [236015.288393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.288394] CR2: 00007f51c51d4080 CR3: 000000302ffc0000 CR4: 00000000003407e0 [236015.288394] Call Trace: [236015.288397] [] queued_spin_lock_slowpath+0xb/0xf [236015.288399] [] _raw_spin_lock+0x20/0x30 [236015.288410] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.288419] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.288430] [] ? osd_ldiskfs_write_record+0x346/0x410 [osd_ldiskfs] [236015.288441] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.288443] [] ? __brelse+0x3d/0x50 [236015.288453] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.288462] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.288464] [] ? __find_get_block+0xbc/0x120 [236015.288471] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.288478] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.288485] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.288487] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.288493] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.288500] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.288501] [] ? generic_getxattr+0x52/0x70 [236015.288508] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.288515] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.288523] [] lod_it_load+0x27/0x90 [lod] [236015.288544] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.288552] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.288559] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.288570] [] mdt_readpage+0x63a/0x880 [mdt] [236015.288610] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.288646] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.288653] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.288687] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.288722] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.288724] [] ? __wake_up+0x44/0x50 [236015.288757] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.288759] [] ? __schedule+0x42a/0x860 [236015.288791] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.288793] [] kthread+0xd1/0xe0 [236015.288795] [] ? insert_kthread_work+0x40/0x40 [236015.288796] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.288798] [] ? insert_kthread_work+0x40/0x40 [236015.288815] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.296350] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [mdt_rdpg01_001:67227] [236015.296389] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.296401] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.296405] CPU: 21 PID: 67227 Comm: mdt_rdpg01_001 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.296405] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.296407] task: ffff9a61b21fd140 ti: ffff9a81a5e10000 task.ti: ffff9a81a5e10000 [236015.296413] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.296414] RSP: 0018:ffff9a81a5e136c0 EFLAGS: 00000246 [236015.296415] RAX: 0000000000000000 RBX: ffff9a83ed279888 RCX: 0000000000a90000 [236015.296416] RDX: ffff9a61bf05b780 RSI: 0000000001210101 RDI: ffff9a7b0c800c80 [236015.296417] RBP: ffff9a81a5e136c0 R08: ffff9a71bf75b780 R09: 0000000000000000 [236015.296418] R10: ffff9a71bf75f140 R11: fffff4b48ef9fc00 R12: ffff9a81a5e13690 [236015.296419] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.296420] FS: 00007f32e0171700(0000) GS:ffff9a71bf740000(0000) knlGS:0000000000000000 [236015.296421] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.296422] CR2: 00007f1073054000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.296423] Call Trace: [236015.296427] [] queued_spin_lock_slowpath+0xb/0xf [236015.296431] [] _raw_spin_lock+0x20/0x30 [236015.296445] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.296454] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.296468] [] ? osd_write+0x15b/0x5c0 [osd_ldiskfs] [236015.296480] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.296484] [] ? __brelse+0x3d/0x50 [236015.296495] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.296506] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.296508] [] ? __find_get_block+0xbc/0x120 [236015.296516] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.296523] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.296531] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.296535] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.296541] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.296549] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.296551] [] ? generic_getxattr+0x52/0x70 [236015.296558] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.296566] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.296577] [] lod_it_load+0x27/0x90 [lod] [236015.296605] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.296616] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.296625] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.296639] [] mdt_readpage+0x63a/0x880 [mdt] [236015.296687] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.296728] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.296736] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.296774] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.296812] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.296814] [] ? __wake_up+0x44/0x50 [236015.296851] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.296887] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.296889] [] kthread+0xd1/0xe0 [236015.296891] [] ? insert_kthread_work+0x40/0x40 [236015.296893] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.296895] [] ? insert_kthread_work+0x40/0x40 [236015.296915] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.304346] NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [mdt_rdpg02_000:67228] [236015.304375] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.304385] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.304387] CPU: 22 PID: 67228 Comm: mdt_rdpg02_000 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.304388] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.304389] task: ffff9a61b21f9040 ti: ffff9a81a5e14000 task.ti: ffff9a81a5e14000 [236015.304393] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.304395] RSP: 0018:ffff9a81a5e176c0 EFLAGS: 00000246 [236015.304396] RAX: 0000000000000000 RBX: ffff9a81a5e17718 RCX: 0000000000b10000 [236015.304396] RDX: ffff9a91ff49b780 RSI: 0000000000590101 RDI: ffff9a7b0c800c80 [236015.304397] RBP: ffff9a81a5e176c0 R08: ffff9a81bf75b780 R09: 0000000000000000 [236015.304398] R10: ffff9a81bf75f140 R11: fffff4b4e0be1e00 R12: 00000000000009a8 [236015.304399] R13: 0000000000000000 R14: ffffffffc159c71b R15: ffff9a81a5e17678 [236015.304400] FS: 00007fc46f387880(0000) GS:ffff9a81bf740000(0000) knlGS:0000000000000000 [236015.304401] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.304402] CR2: 00007fc45c7d82b4 CR3: 0000004035a96000 CR4: 00000000003407e0 [236015.304403] Call Trace: [236015.304405] [] queued_spin_lock_slowpath+0xb/0xf [236015.304408] [] _raw_spin_lock+0x20/0x30 [236015.304421] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.304429] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.304440] [] ? osd_ldiskfs_write_record+0x346/0x410 [osd_ldiskfs] [236015.304452] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.304455] [] ? __brelse+0x3d/0x50 [236015.304466] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.304476] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.304478] [] ? __find_get_block+0xbc/0x120 [236015.304485] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.304493] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.304500] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.304504] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.304510] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.304518] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.304519] [] ? generic_getxattr+0x52/0x70 [236015.304526] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.304534] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.304542] [] lod_it_load+0x27/0x90 [lod] [236015.304565] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.304574] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.304583] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.304593] [] mdt_readpage+0x63a/0x880 [mdt] [236015.304635] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.304674] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.304680] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.304717] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.304752] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.304754] [] ? __wake_up+0x44/0x50 [236015.304789] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.304791] [] ? __schedule+0x42a/0x860 [236015.304825] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.304827] [] kthread+0xd1/0xe0 [236015.304829] [] ? insert_kthread_work+0x40/0x40 [236015.304831] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.304833] [] ? insert_kthread_work+0x40/0x40 [236015.304852] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.322345] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [mdt_rdpg00_013:70186] [236015.322372] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.322381] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.322384] CPU: 24 PID: 70186 Comm: mdt_rdpg00_013 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.322384] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.322386] task: ffff9a71b96d4100 ti: ffff9a6d39bb8000 task.ti: ffff9a6d39bb8000 [236015.322389] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.322391] RSP: 0018:ffff9a6d39bbb6c0 EFLAGS: 00000246 [236015.322391] RAX: 0000000000000000 RBX: ffff9a61bef9ab80 RCX: 0000000000c10000 [236015.322392] RDX: ffff9a61bef1b780 RSI: 0000000000810101 RDI: ffff9a7b0c800c80 [236015.322393] RBP: ffff9a6d39bbb6c0 R08: ffff9a61bef9b780 R09: 0000000000000000 [236015.322394] R10: ffff9a61bef9f140 R11: fffff4b46a5d7000 R12: ffff9a71b96d4798 [236015.322394] R13: 0000000139bbb638 R14: ffff9a61bef80000 R15: ffffffffb3e2a59e [236015.322396] FS: 00007fa33781e740(0000) GS:ffff9a61bef80000(0000) knlGS:0000000000000000 [236015.322397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.322397] CR2: 00007fa3374091cc CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.322398] Call Trace: [236015.322401] [] queued_spin_lock_slowpath+0xb/0xf [236015.322403] [] _raw_spin_lock+0x20/0x30 [236015.322415] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.322423] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.322435] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.322437] [] ? __brelse+0x3d/0x50 [236015.322447] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.322456] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.322458] [] ? __find_get_block+0xbc/0x120 [236015.322466] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.322473] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.322480] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.322482] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.322488] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.322496] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.322497] [] ? generic_getxattr+0x52/0x70 [236015.322504] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.322511] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.322520] [] lod_it_load+0x27/0x90 [lod] [236015.322541] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.322550] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.322558] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.322568] [] mdt_readpage+0x63a/0x880 [mdt] [236015.322609] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.322647] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.322653] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.322689] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.322724] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.322725] [] ? __wake_up+0x44/0x50 [236015.322760] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.322762] [] ? __schedule+0x42a/0x860 [236015.322796] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.322797] [] kthread+0xd1/0xe0 [236015.322799] [] ? insert_kthread_work+0x40/0x40 [236015.322801] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.322802] [] ? insert_kthread_work+0x40/0x40 [236015.322821] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.331346] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [mdt_rdpg01_008:68122] [236015.331373] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.331382] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.331385] CPU: 25 PID: 68122 Comm: mdt_rdpg01_008 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.331386] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.331386] task: ffff9a61be08b0c0 ti: ffff9a611a250000 task.ti: ffff9a611a250000 [236015.331390] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.331391] RSP: 0018:ffff9a611a253678 EFLAGS: 00000246 [236015.331391] RAX: 0000000000000000 RBX: 000018a200000002 RCX: 0000000000c90000 [236015.331392] RDX: ffff9a61bee5b780 RSI: 0000000000210101 RDI: ffff9a7b0c800c80 [236015.331393] RBP: ffff9a611a253678 R08: ffff9a71bf79b780 R09: 0000000000000000 [236015.331394] R10: ffff9a71bf79f140 R11: fffff4b4cc350a00 R12: ffff9a58c5363000 [236015.331394] R13: ffff9a61b5290000 R14: ffff9a611a253908 R15: ffff9a58c5363328 [236015.331395] FS: 00007f32e01b3700(0000) GS:ffff9a71bf780000(0000) knlGS:0000000000000000 [236015.331396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.331397] CR2: 00007f32e0224000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.331398] Call Trace: [236015.331400] [] queued_spin_lock_slowpath+0xb/0xf [236015.331402] [] _raw_spin_lock+0x20/0x30 [236015.331412] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.331419] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.331430] [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] [236015.331439] [] ? __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs] [236015.331448] [] ? ldiskfs_dirty_inode+0x54/0x60 [ldiskfs] [236015.331457] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.331466] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.331474] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.331481] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.331488] [] dx_probe+0xa2/0xa20 [ldiskfs] [236015.331496] [] ? __ldiskfs_get_inode_loc+0xe3/0x3c0 [ldiskfs] [236015.331502] [] ? ldiskfs_xattr_find_entry+0x9f/0x130 [ldiskfs] [236015.331509] [] ldiskfs_htree_fill_tree+0x199/0x2f0 [ldiskfs] [236015.331515] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.331523] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.331525] [] ? generic_getxattr+0x52/0x70 [236015.331532] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.331539] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.331547] [] lod_it_load+0x27/0x90 [lod] [236015.331569] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.331579] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.331587] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.331598] [] mdt_readpage+0x63a/0x880 [mdt] [236015.331638] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.331676] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.331683] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.331718] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.331752] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.331754] [] ? __wake_up+0x44/0x50 [236015.331787] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.331789] [] ? __schedule+0x42a/0x860 [236015.331822] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.331824] [] kthread+0xd1/0xe0 [236015.331825] [] ? insert_kthread_work+0x40/0x40 [236015.331827] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.331829] [] ? insert_kthread_work+0x40/0x40 [236015.331848] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.356346] NMI watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [mdt_rdpg00_000:67224] [236015.356374] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.356384] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.356386] CPU: 28 PID: 67224 Comm: mdt_rdpg00_000 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.356387] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.356388] task: ffff9a61b21fc100 ti: ffff9a819eb64000 task.ti: ffff9a819eb64000 [236015.356391] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.356392] RSP: 0018:ffff9a819eb67678 EFLAGS: 00000246 [236015.356393] RAX: 0000000000000000 RBX: 00001f9400000002 RCX: 0000000000e10000 [236015.356394] RDX: ffff9a71bf79b780 RSI: 0000000000c90101 RDI: ffff9a7b0c800c80 [236015.356394] RBP: ffff9a819eb67678 R08: ffff9a61befdb780 R09: 0000000000000000 [236015.356395] R10: ffff9a61befdf140 R11: fffff4b4527b0800 R12: ffff9a6dedccd800 [236015.356396] R13: ffff9a61b5290000 R14: ffff9a819eb67908 R15: ffff9a6dedccdb28 [236015.356397] FS: 00007f3c307ae880(0000) GS:ffff9a61befc0000(0000) knlGS:0000000000000000 [236015.356398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.356399] CR2: 00007f1077319000 CR3: 000000302ffc0000 CR4: 00000000003407e0 [236015.356399] Call Trace: [236015.356402] [] queued_spin_lock_slowpath+0xb/0xf [236015.356404] [] _raw_spin_lock+0x20/0x30 [236015.356415] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.356423] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.356434] [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] [236015.356435] [] ? __brelse+0x3d/0x50 [236015.356437] [] ? bh_lru_install+0x18a/0x1e0 [236015.356447] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.356457] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.356466] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.356473] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.356480] [] dx_probe+0xa2/0xa20 [ldiskfs] [236015.356489] [] ? __ldiskfs_get_inode_loc+0xe3/0x3c0 [ldiskfs] [236015.356495] [] ? ldiskfs_xattr_find_entry+0x9f/0x130 [ldiskfs] [236015.356503] [] ldiskfs_htree_fill_tree+0x199/0x2f0 [ldiskfs] [236015.356509] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.356517] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.356518] [] ? generic_getxattr+0x52/0x70 [236015.356525] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.356533] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.356541] [] lod_it_load+0x27/0x90 [lod] [236015.356564] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.356573] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.356581] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.356592] [] mdt_readpage+0x63a/0x880 [mdt] [236015.356633] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.356670] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.356676] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.356711] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.356745] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.356747] [] ? __wake_up+0x44/0x50 [236015.356781] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.356783] [] ? __schedule+0x42a/0x860 [236015.356816] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.356818] [] kthread+0xd1/0xe0 [236015.356819] [] ? insert_kthread_work+0x40/0x40 [236015.356821] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.356823] [] ? insert_kthread_work+0x40/0x40 [236015.356842] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.365347] NMI watchdog: BUG: soft lockup - CPU#29 stuck for 22s! [mdt_rdpg01_005:68048] [236015.365384] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.365395] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.365398] CPU: 29 PID: 68048 Comm: mdt_rdpg01_005 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.365399] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.365400] task: ffff9a5fb69ea080 ti: ffff9a6114e98000 task.ti: ffff9a6114e98000 [236015.365404] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [236015.365405] RSP: 0018:ffff9a6114e9b6c0 EFLAGS: 00000246 [236015.365406] RAX: 0000000000000000 RBX: ffff9a83ed279680 RCX: 0000000000e90000 [236015.365407] RDX: ffff9a71bf8db780 RSI: 0000000001690101 RDI: ffff9a7b0c800c80 [236015.365408] RBP: ffff9a6114e9b6c0 R08: ffff9a71bf7db780 R09: 0000000000000000 [236015.365408] R10: ffff9a71bf7df140 R11: fffff4b484788a00 R12: ffff9a6114e9b690 [236015.365409] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.365410] FS: 00007ff63de11740(0000) GS:ffff9a71bf7c0000(0000) knlGS:0000000000000000 [236015.365411] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.365412] CR2: 0000000000485000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.365413] Call Trace: [236015.365416] [] queued_spin_lock_slowpath+0xb/0xf [236015.365419] [] _raw_spin_lock+0x20/0x30 [236015.365431] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.365438] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.365442] [] ? __rmqueue+0x8a/0x460 [236015.365452] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.365455] [] ? __brelse+0x3d/0x50 [236015.365465] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.365474] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.365475] [] ? __find_get_block+0xbc/0x120 [236015.365483] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.365490] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.365498] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.365501] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.365507] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.365516] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.365518] [] ? generic_getxattr+0x52/0x70 [236015.365525] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.365532] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.365543] [] lod_it_load+0x27/0x90 [lod] [236015.365571] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.365581] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.365590] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.365604] [] mdt_readpage+0x63a/0x880 [mdt] [236015.365648] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.365687] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.365695] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.365731] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.365766] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.365769] [] ? __wake_up+0x44/0x50 [236015.365804] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.365838] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.365840] [] kthread+0xd1/0xe0 [236015.365841] [] ? insert_kthread_work+0x40/0x40 [236015.365844] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.365845] [] ? insert_kthread_work+0x40/0x40 [236015.365865] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [236015.374347] NMI watchdog: BUG: soft lockup - CPU#30 stuck for 22s! [mdt_rdpg02_002:67776] [236015.374373] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.374382] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.374384] CPU: 30 PID: 67776 Comm: mdt_rdpg02_002 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.374385] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.374386] task: ffff9a6097258000 ti: ffff9a619c344000 task.ti: ffff9a619c344000 [236015.374388] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [236015.374389] RSP: 0018:ffff9a619c3476c0 EFLAGS: 00000246 [236015.374390] RAX: 0000000000000000 RBX: ffff9a83ed2797b8 RCX: 0000000000f10000 [236015.374391] RDX: ffff9a61bf01b780 RSI: 0000000001010101 RDI: ffff9a7b0c800c80 [236015.374392] RBP: ffff9a619c3476c0 R08: ffff9a81bf7db780 R09: 0000000000000000 [236015.374392] R10: ffff9a81bf7df140 R11: fffff4b4de67a800 R12: ffff9a619c347690 [236015.374393] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.374394] FS: 00007f417cd85700(0000) GS:ffff9a81bf7c0000(0000) knlGS:0000000000000000 [236015.374395] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.374396] CR2: 00007f41828ed000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.374397] Call Trace: [236015.374399] [] queued_spin_lock_slowpath+0xb/0xf [236015.374401] [] _raw_spin_lock+0x20/0x30 [236015.374411] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.374418] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.374428] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.374430] [] ? __brelse+0x3d/0x50 [236015.374439] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.374447] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.374449] [] ? __find_get_block+0xbc/0x120 [236015.374456] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.374463] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.374471] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.374473] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.374479] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.374487] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.374488] [] ? generic_getxattr+0x52/0x70 [236015.374495] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.374502] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.374511] [] lod_it_load+0x27/0x90 [lod] [236015.374533] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.374542] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.374550] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.374561] [] mdt_readpage+0x63a/0x880 [mdt] [236015.374601] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.374639] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.374645] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.374680] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.374715] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.374717] [] ? __wake_up+0x44/0x50 [236015.374751] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.374785] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.374786] [] kthread+0xd1/0xe0 [236015.374788] [] ? insert_kthread_work+0x40/0x40 [236015.374790] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.374791] [] ? insert_kthread_work+0x40/0x40 [236015.374810] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [236015.390347] NMI watchdog: BUG: soft lockup - CPU#32 stuck for 22s! [mdt_rdpg00_009:68144] [236015.390373] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.390382] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.390385] CPU: 32 PID: 68144 Comm: mdt_rdpg00_009 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.390385] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.390386] task: ffff9a7d76bb30c0 ti: ffff9a71b0724000 task.ti: ffff9a71b0724000 [236015.390389] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [236015.390390] RSP: 0018:ffff9a71b07276c0 EFLAGS: 00000246 [236015.390391] RAX: 0000000000000000 RBX: ffff9a83ed2798f0 RCX: 0000000001010000 [236015.390391] RDX: ffff9a71bf7db780 RSI: 0000000000e90101 RDI: ffff9a7b0c800c80 [236015.390392] RBP: ffff9a71b07276c0 R08: ffff9a61bf01b780 R09: 0000000000000000 [236015.390393] R10: ffff9a61bf01f140 R11: fffff4b451d08e00 R12: ffff9a71b0727690 [236015.390393] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.390395] FS: 00007fd397282740(0000) GS:ffff9a61bf000000(0000) knlGS:0000000000000000 [236015.390396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.390396] CR2: 00007fd396e6d1cc CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.390397] Call Trace: [236015.390399] [] queued_spin_lock_slowpath+0xb/0xf [236015.390401] [] _raw_spin_lock+0x20/0x30 [236015.390412] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.390419] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.390429] [] ? osd_write+0x15b/0x5c0 [osd_ldiskfs] [236015.390439] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.390441] [] ? __brelse+0x3d/0x50 [236015.390450] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.390459] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.390461] [] ? __find_get_block+0xbc/0x120 [236015.390468] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.390475] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.390482] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.390485] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.390490] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.390498] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.390499] [] ? generic_getxattr+0x52/0x70 [236015.390506] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.390513] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.390521] [] lod_it_load+0x27/0x90 [lod] [236015.390544] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.390552] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.390560] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.390571] [] mdt_readpage+0x63a/0x880 [mdt] [236015.390612] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.390649] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.390655] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.390690] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.390725] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.390726] [] ? __wake_up+0x44/0x50 [236015.390760] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.390794] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.390795] [] kthread+0xd1/0xe0 [236015.390797] [] ? insert_kthread_work+0x40/0x40 [236015.390799] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.390800] [] ? insert_kthread_work+0x40/0x40 [236015.390819] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [236015.399349] NMI watchdog: BUG: soft lockup - CPU#33 stuck for 22s! [mdt_rdpg01_000:67226] [236015.399374] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.399383] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.399385] CPU: 33 PID: 67226 Comm: mdt_rdpg01_000 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.399385] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.399386] task: ffff9a61b21fa080 ti: ffff9a81a431c000 task.ti: ffff9a81a431c000 [236015.399390] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [236015.399391] RSP: 0018:ffff9a81a431f6c0 EFLAGS: 00000246 [236015.399392] RAX: 0000000000000000 RBX: ffff9a71bf81f140 RCX: 0000000001090000 [236015.399393] RDX: ffff9a61bee1b780 RSI: 0000000000010101 RDI: ffff9a7b0c800c80 [236015.399393] RBP: ffff9a81a431f6c0 R08: ffff9a71bf81b780 R09: 0000000000000000 [236015.399394] R10: ffff9a71bf81f140 R11: fffff4b44b4d8000 R12: ffff9a81a431f670 [236015.399395] R13: ffffffffc156b0b2 R14: 00000000ffffffff R15: 0000000000008050 [236015.399396] FS: 00007fd838f0d740(0000) GS:ffff9a71bf800000(0000) knlGS:0000000000000000 [236015.399397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.399398] CR2: 00007fd838790320 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.399398] Call Trace: [236015.399401] [] queued_spin_lock_slowpath+0xb/0xf [236015.399403] [] _raw_spin_lock+0x20/0x30 [236015.399415] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.399422] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.399433] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.399435] [] ? wake_up_bit+0x25/0x30 [236015.399445] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.399454] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.399456] [] ? __find_get_block+0xbc/0x120 [236015.399463] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.399470] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.399477] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.399479] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.399485] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.399492] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.399493] [] ? generic_getxattr+0x52/0x70 [236015.399500] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.399507] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.399515] [] lod_it_load+0x27/0x90 [lod] [236015.399536] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.399544] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.399551] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.399562] [] mdt_readpage+0x63a/0x880 [mdt] [236015.399600] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.399636] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.399642] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.399676] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.399710] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.399711] [] ? __wake_up+0x44/0x50 [236015.399744] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.399777] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.399778] [] kthread+0xd1/0xe0 [236015.399780] [] ? insert_kthread_work+0x40/0x40 [236015.399782] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.399783] [] ? insert_kthread_work+0x40/0x40 [236015.399801] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [236015.424348] NMI watchdog: BUG: soft lockup - CPU#36 stuck for 22s! [mdt_io00_020:68011] [236015.424374] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.424383] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.424386] CPU: 36 PID: 68011 Comm: mdt_io00_020 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.424386] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.424387] task: ffff9a6143624100 ti: ffff9a603bb88000 task.ti: ffff9a603bb88000 [236015.424390] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.424391] RSP: 0018:ffff9a603bb8b798 EFLAGS: 00000246 [236015.424392] RAX: 0000000000000000 RBX: ffff9a61b0e3b900 RCX: 0000000001210000 [236015.424393] RDX: ffff9a71bf6db780 RSI: 0000000000690101 RDI: ffff9a7b0c800c80 [236015.424393] RBP: ffff9a603bb8b798 R08: ffff9a61bf05b780 R09: 0000000000000000 [236015.424394] R10: ffff9a61bf05f140 R11: fffff4b460887000 R12: ffffffffc159c71b [236015.424395] R13: ffff9a603bb8b758 R14: 000000000000081f R15: ffffffffc16c0ea0 [236015.424396] FS: 00007fdd84544880(0000) GS:ffff9a61bf040000(0000) knlGS:0000000000000000 [236015.424397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.424398] CR2: 00007f67cfed0320 CR3: 0000001036c42000 CR4: 00000000003407e0 [236015.424398] Call Trace: [236015.424400] [] queued_spin_lock_slowpath+0xb/0xf [236015.424402] [] _raw_spin_lock+0x20/0x30 [236015.424413] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.424420] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.424430] [] ? osd_write+0x15b/0x5c0 [osd_ldiskfs] [236015.424433] [] ? account_entity_dequeue+0xae/0xd0 [236015.424435] [] ? ktime_get_ts64+0x52/0xf0 [236015.424445] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.424453] [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] [236015.424474] [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [236015.424484] [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] [236015.424494] [] osd_read_prep+0x2de/0x400 [osd_ldiskfs] [236015.424509] [] mdt_obd_preprw+0xd9b/0x10a0 [mdt] [236015.424550] [] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [236015.424584] [] ? ptl_send_buf+0x146/0x530 [ptlrpc] [236015.424604] [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [236015.424641] [] ? null_alloc_rs+0x186/0x340 [ptlrpc] [236015.424675] [] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [236015.424709] [] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [236015.424743] [] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [236015.424782] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.424819] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.424825] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.424859] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.424893] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.424895] [] ? __wake_up+0x44/0x50 [236015.424929] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.424962] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.424964] [] kthread+0xd1/0xe0 [236015.424966] [] ? insert_kthread_work+0x40/0x40 [236015.424968] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.424969] [] ? insert_kthread_work+0x40/0x40 [236015.424989] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.433351] NMI watchdog: BUG: soft lockup - CPU#37 stuck for 22s! [mdt_rdpg01_006:68119] [236015.433389] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.433402] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.433405] CPU: 37 PID: 68119 Comm: mdt_rdpg01_006 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.433406] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.433407] task: ffff9a61be088000 ti: ffff9a6118b98000 task.ti: ffff9a6118b98000 [236015.433420] RIP: 0010:[] [] ldiskfs_inode_touch_time_cmp+0xd/0x90 [ldiskfs] [236015.433421] RSP: 0018:ffff9a6118b9b728 EFLAGS: 00000286 [236015.433422] RAX: 8000040400080000 RBX: ffffffffb3e9b6f4 RCX: 0000000107e2ac6b [236015.433423] RDX: ffff9a9118cfcee8 RSI: ffff9a5d765d07b8 RDI: 0000000000000000 [236015.433424] RBP: ffff9a6118b9b778 R08: 000000000000000a R09: 0000000000000000 [236015.433425] R10: 000000000000144b R11: ffff9a6118b9b486 R12: 0000000000000006 [236015.433425] R13: 0000000000000032 R14: 0000000000000000 R15: 0000000000000000 [236015.433427] FS: 00007f5719697700(0000) GS:ffff9a71bf840000(0000) knlGS:0000000000000000 [236015.433428] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.433429] CR2: 00007f06bf34c80d CR3: 00000020371ac000 CR4: 00000000003407e0 [236015.433430] Call Trace: [236015.433434] [] ? merge+0x62/0xc0 [236015.433444] [] ? ldiskfs_init_inode_table+0x410/0x410 [ldiskfs] [236015.433446] [] list_sort+0x9b/0x250 [236015.433455] [] __ldiskfs_es_shrink+0x1ce/0x2a0 [ldiskfs] [236015.433464] [] ldiskfs_es_shrink+0xb4/0x130 [ldiskfs] [236015.433468] [] shrink_slab+0x175/0x340 [236015.433472] [] ? vmpressure+0x61/0x90 [236015.433474] [] zone_reclaim+0x1d1/0x2f0 [236015.433477] [] get_page_from_freelist+0x87b/0xa70 [236015.433479] [] __alloc_pages_nodemask+0x176/0x420 [236015.433516] [] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [236015.433520] [] alloc_pages_current+0x98/0x110 [236015.433534] [] mdt_readpage+0x3cc/0x880 [mdt] [236015.433577] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.433615] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.433622] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.433657] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.433691] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.433693] [] ? __wake_up+0x44/0x50 [236015.433728] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.433761] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.433763] [] kthread+0xd1/0xe0 [236015.433764] [] ? insert_kthread_work+0x40/0x40 [236015.433766] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.433768] [] ? insert_kthread_work+0x40/0x40 [236015.433787] Code: ff 8d 4a 01 89 d0 f0 0f b1 0f 39 d0 0f 84 fb fd ff ff 89 c2 eb e2 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 8b 86 e0 fc ff ff <48> 89 e5 48 c1 e8 2b a8 01 74 15 48 8b 8a e0 fc ff ff b8 01 00 [236015.442349] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [mdt_rdpg02_003:67876] [236015.442376] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.442386] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.442388] CPU: 38 PID: 67876 Comm: mdt_rdpg02_003 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.442389] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.442390] task: ffff9a617daf2080 ti: ffff9a61b279c000 task.ti: ffff9a61b279c000 [236015.442394] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.442395] RSP: 0018:ffff9a61b279f678 EFLAGS: 00000246 [236015.442396] RAX: 0000000000000000 RBX: ffff9a74d82ccd00 RCX: 0000000001310000 [236015.442397] RDX: ffff9a81bf8db780 RSI: 0000000001710101 RDI: ffff9a7b0c800c80 [236015.442398] RBP: ffff9a61b279f678 R08: ffff9a81bf85b780 R09: 0000000000000000 [236015.442398] R10: 000000001f70fe01 R11: fffff4b4e27dc200 R12: ffff9a61b279f6a0 [236015.442399] R13: fffff4b4e27dc200 R14: ffffffffc15bae50 R15: 0000037600000000 [236015.442401] FS: 00007f418058c700(0000) GS:ffff9a81bf840000(0000) knlGS:0000000000000000 [236015.442402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.442403] CR2: 00007f128406d000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.442403] Call Trace: [236015.442406] [] queued_spin_lock_slowpath+0xb/0xf [236015.442408] [] _raw_spin_lock+0x20/0x30 [236015.442421] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.442429] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.442441] [] ? osd_write+0x15b/0x5c0 [osd_ldiskfs] [236015.442452] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.442464] [] ? lod_sub_write+0x1d0/0x410 [lod] [236015.442506] [] ? tgt_last_rcvd_update+0x6be/0xc90 [ptlrpc] [236015.442518] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.442528] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.442536] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.442544] [] dx_probe+0xa2/0xa20 [ldiskfs] [236015.442554] [] ? __ldiskfs_get_inode_loc+0xe3/0x3c0 [ldiskfs] [236015.442560] [] ? ldiskfs_xattr_find_entry+0x9f/0x130 [ldiskfs] [236015.442568] [] ldiskfs_htree_fill_tree+0x199/0x2f0 [ldiskfs] [236015.442574] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.442582] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.442583] [] ? generic_getxattr+0x52/0x70 [236015.442590] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.442598] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.442609] [] lod_it_load+0x27/0x90 [lod] [236015.442631] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.442640] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.442649] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.442659] [] mdt_readpage+0x63a/0x880 [mdt] [236015.442700] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.442739] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.442746] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.442782] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.442818] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.442820] [] ? __wake_up+0x44/0x50 [236015.442856] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.442858] [] ? __schedule+0x42a/0x860 [236015.442893] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.442894] [] kthread+0xd1/0xe0 [236015.442896] [] ? insert_kthread_work+0x40/0x40 [236015.442898] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.442899] [] ? insert_kthread_work+0x40/0x40 [236015.442920] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.458352] NMI watchdog: BUG: soft lockup - CPU#40 stuck for 22s! [mdt_rdpg00_014:70187] [236015.458379] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.458388] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.458391] CPU: 40 PID: 70187 Comm: mdt_rdpg00_014 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.458391] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.458392] task: ffff9a8e81a71040 ti: ffff9a612d798000 task.ti: ffff9a612d798000 [236015.458395] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.458396] RSP: 0018:ffff9a612d79b6c0 EFLAGS: 00000246 [236015.458396] RAX: 0000000000000000 RBX: ffff9a83ed279680 RCX: 0000000001410000 [236015.458397] RDX: ffff9a71bf71b780 RSI: 0000000000890101 RDI: ffff9a7b0c800c80 [236015.458398] RBP: ffff9a612d79b6c0 R08: ffff9a61bf09b780 R09: 0000000000000000 [236015.458398] R10: ffff9a61bf09f140 R11: fffff4b46a5d6a00 R12: ffff9a612d79b690 [236015.458399] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.458400] FS: 00007f43755cd740(0000) GS:ffff9a61bf080000(0000) knlGS:0000000000000000 [236015.458401] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.458402] CR2: 00007f43751b81cc CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.458403] Call Trace: [236015.458405] [] queued_spin_lock_slowpath+0xb/0xf [236015.458406] [] _raw_spin_lock+0x20/0x30 [236015.458417] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.458424] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.458434] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.458436] [] ? __brelse+0x3d/0x50 [236015.458444] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.458453] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.458455] [] ? __find_get_block+0xbc/0x120 [236015.458462] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.458469] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.458476] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.458479] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.458484] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.458492] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.458493] [] ? generic_getxattr+0x52/0x70 [236015.458500] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.458507] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.458516] [] lod_it_load+0x27/0x90 [lod] [236015.458538] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.458546] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.458554] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.458565] [] mdt_readpage+0x63a/0x880 [mdt] [236015.458606] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.458644] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.458650] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.458684] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.458719] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.458720] [] ? __wake_up+0x44/0x50 [236015.458754] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.458787] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.458789] [] kthread+0xd1/0xe0 [236015.458791] [] ? insert_kthread_work+0x40/0x40 [236015.458793] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.458794] [] ? insert_kthread_work+0x40/0x40 [236015.458813] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.495351] NMI watchdog: BUG: soft lockup - CPU#43 stuck for 22s! [ldlm_cn03_012:74315] [236015.495377] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.495386] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.495388] CPU: 43 PID: 74315 Comm: ldlm_cn03_012 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.495389] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.495390] task: ffff9a5ad7bc5140 ti: ffff9a53a6628000 task.ti: ffff9a53a6628000 [236015.495393] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.495394] RSP: 0018:ffff9a53a662b880 EFLAGS: 00000246 [236015.495395] RAX: 0000000000000000 RBX: ffff9a5186de2038 RCX: 0000000001590000 [236015.495396] RDX: ffff9a61bef5b780 RSI: 0000000000a10101 RDI: ffff9a7b0c800c80 [236015.495397] RBP: ffff9a53a662b880 R08: ffff9a91ff69b780 R09: 0000000000000000 [236015.495397] R10: ffff9a91ff69f0c0 R11: fffff4b5262693c0 R12: 0000000000000000 [236015.495398] R13: ffffffffc159c26c R14: ffff9a53a662b810 R15: ffff9a53a662b8c0 [236015.495399] FS: 00007f417e588700(0000) GS:ffff9a91ff680000(0000) knlGS:0000000000000000 [236015.495400] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.495401] CR2: 00007f41828ec000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.495402] Call Trace: [236015.495404] [] queued_spin_lock_slowpath+0xb/0xf [236015.495406] [] _raw_spin_lock+0x20/0x30 [236015.495418] [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] [236015.495429] [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] [236015.495438] [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] [236015.495448] [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] [236015.495455] [] ? ldiskfs_orphan_del+0x171/0x240 [ldiskfs] [236015.495465] [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] [236015.495473] [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] [236015.495482] [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] [236015.495484] [] evict+0xb4/0x180 [236015.495485] [] iput+0xfc/0x190 [236015.495494] [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] [236015.495515] [] lu_object_free.isra.32+0x68/0x170 [obdclass] [236015.495523] [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] [236015.495543] [] lu_object_put+0xc5/0x3d0 [obdclass] [236015.495558] [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] [236015.495571] [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] [236015.495599] [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] [236015.495631] [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] [236015.495633] [] ? cpumask_next_and+0x35/0x50 [236015.495635] [] ? kmem_cache_alloc_node_trace+0x11d/0x210 [236015.495654] [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [236015.495680] [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] [236015.495711] [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] [236015.495737] [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [236015.495764] [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] [236015.495791] [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] [236015.495820] [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] [236015.495849] [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] [236015.495878] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [236015.495912] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.495945] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.495947] [] ? __wake_up+0x44/0x50 [236015.495980] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.495981] [] ? __schedule+0x42a/0x860 [236015.496013] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.496015] [] kthread+0xd1/0xe0 [236015.496017] [] ? finish_task_switch+0x54/0x1c0 [236015.496018] [] ? insert_kthread_work+0x40/0x40 [236015.496020] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.496022] [] ? insert_kthread_work+0x40/0x40 [236015.496039] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236015.505351] NMI watchdog: BUG: soft lockup - CPU#45 stuck for 22s! [mdt_rdpg01_018:70184] [236015.505386] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.505397] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.505400] CPU: 45 PID: 70184 Comm: mdt_rdpg01_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.505400] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.505401] task: ffff9a71b96d6180 ti: ffff9a6f0edd8000 task.ti: ffff9a6f0edd8000 [236015.505406] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x15e/0x200 [236015.505407] RSP: 0018:ffff9a6f0eddb6c0 EFLAGS: 00000212 [236015.505408] RAX: 0000000000000101 RBX: ffff9a83ed279618 RCX: 0000000001690000 [236015.505409] RDX: 0000000000390101 RSI: 0000000000000101 RDI: ffff9a7b0c800c80 [236015.505410] RBP: ffff9a6f0eddb6c0 R08: ffff9a71bf8db780 R09: 0000000000000000 [236015.505410] R10: ffff9a71bf8df140 R11: fffff4b490cfca00 R12: ffff9a6f0eddb690 [236015.505411] R13: ffff9a81ac460501 R14: 0000000000000001 R15: ffffffffc050b2d1 [236015.505412] FS: 00007f0ff848c740(0000) GS:ffff9a71bf8c0000(0000) knlGS:0000000000000000 [236015.505413] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.505414] CR2: 0000000001535108 CR3: 0000003c68610000 CR4: 00000000003407e0 [236015.505415] Call Trace: [236015.505418] [] queued_spin_lock_slowpath+0xb/0xf [236015.505420] [] _raw_spin_lock+0x20/0x30 [236015.505432] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.505439] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.505449] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.505453] [] ? __brelse+0x3d/0x50 [236015.505462] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.505470] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.505472] [] ? __find_get_block+0xbc/0x120 [236015.505479] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.505486] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236015.505494] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236015.505500] [] ? ldiskfs_readdir+0x799/0x850 [ldiskfs] [236015.505503] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236015.505509] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.505518] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.505520] [] ? generic_getxattr+0x52/0x70 [236015.505527] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.505534] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.505545] [] lod_it_load+0x27/0x90 [lod] [236015.505571] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.505582] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.505590] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.505604] [] mdt_readpage+0x63a/0x880 [mdt] [236015.505646] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.505684] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.505690] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.505725] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.505759] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.505761] [] ? __wake_up+0x44/0x50 [236015.505795] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.505828] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.505830] [] kthread+0xd1/0xe0 [236015.505831] [] ? insert_kthread_work+0x40/0x40 [236015.505834] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.505835] [] ? insert_kthread_work+0x40/0x40 [236015.505854] Code: 0f 18 09 8b 17 0f b7 c2 85 c0 74 21 83 f8 03 75 10 eb 1a 66 2e 0f 1f 84 00 00 00 00 00 85 c0 74 0c f3 90 8b 17 0f b7 c2 83 f8 03 <75> f0 be 01 00 00 00 eb 15 66 0f 1f 84 00 00 00 00 00 89 d0 f0 [236015.514354] NMI watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [mdt_rdpg02_007:68134] [236015.514381] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236015.514390] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236015.514392] CPU: 46 PID: 68134 Comm: mdt_rdpg02_007 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236015.514393] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236015.514394] task: ffff9a8c976b9040 ti: ffff9a71b5344000 task.ti: ffff9a71b5344000 [236015.514397] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236015.514398] RSP: 0018:ffff9a71b5347678 EFLAGS: 00000246 [236015.514399] RAX: 0000000000000000 RBX: 00001bb800000002 RCX: 0000000001710000 [236015.514399] RDX: ffff9a71bf81b780 RSI: 0000000001090101 RDI: ffff9a7b0c800c80 [236015.514400] RBP: ffff9a71b5347678 R08: ffff9a81bf8db780 R09: 0000000000000000 [236015.514401] R10: ffff9a81bf8df140 R11: fffff4b4f7682400 R12: ffff9a583ea50800 [236015.514401] R13: ffff9a61b5290000 R14: ffff9a71b5347908 R15: ffff9a583ea50b28 [236015.514403] FS: 00007fc46f387880(0000) GS:ffff9a81bf8c0000(0000) knlGS:0000000000000000 [236015.514403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236015.514404] CR2: 00007f41828ed000 CR3: 0000004035a96000 CR4: 00000000003407e0 [236015.514405] Call Trace: [236015.514407] [] queued_spin_lock_slowpath+0xb/0xf [236015.514409] [] _raw_spin_lock+0x20/0x30 [236015.514420] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236015.514427] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236015.514437] [] ? osd_write+0x15b/0x5c0 [osd_ldiskfs] [236015.514447] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236015.514457] [] ? lod_sub_write+0x1d0/0x410 [lod] [236015.514464] [] ? __jbd2_journal_file_buffer+0x91/0x220 [jbd2] [236015.514473] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236015.514481] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236015.514488] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236015.514496] [] dx_probe+0xa2/0xa20 [ldiskfs] [236015.514504] [] ? __ldiskfs_get_inode_loc+0xe3/0x3c0 [ldiskfs] [236015.514510] [] ? ldiskfs_xattr_find_entry+0x9f/0x130 [ldiskfs] [236015.514517] [] ldiskfs_htree_fill_tree+0x199/0x2f0 [ldiskfs] [236015.514523] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236015.514530] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236015.514532] [] ? generic_getxattr+0x52/0x70 [236015.514539] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236015.514546] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236015.514554] [] lod_it_load+0x27/0x90 [lod] [236015.514576] [] dt_index_walk+0xf8/0x430 [obdclass] [236015.514584] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236015.514593] [] mdd_readpage+0x25f/0x5a0 [mdd] [236015.514603] [] mdt_readpage+0x63a/0x880 [mdt] [236015.514644] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236015.514682] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236015.514688] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236015.514722] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236015.514756] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236015.514758] [] ? __wake_up+0x44/0x50 [236015.514792] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236015.514794] [] ? __schedule+0x42a/0x860 [236015.514827] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236015.514828] [] kthread+0xd1/0xe0 [236015.514830] [] ? insert_kthread_work+0x40/0x40 [236015.514832] [] ret_from_fork_nospec_begin+0xe/0x21 [236015.514833] [] ? insert_kthread_work+0x40/0x40 [236015.514852] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236018.416441] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 16 seconds [236018.416444] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 198 previous similar messages [236019.236445] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [mdt_rdpg02_001:67229] [236019.236472] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236019.236481] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236019.236484] CPU: 14 PID: 67229 Comm: mdt_rdpg02_001 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236019.236484] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236019.236485] task: ffff9a619d7fd140 ti: ffff9a81a488c000 task.ti: ffff9a81a488c000 [236019.236489] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236019.236489] RSP: 0018:ffff9a81a488f678 EFLAGS: 00000246 [236019.236490] RAX: 0000000000000000 RBX: 000002fa00000002 RCX: 0000000000710000 [236019.236491] RDX: ffff9a61beedb780 RSI: 0000000000610101 RDI: ffff9a7b0c800c80 [236019.236492] RBP: ffff9a81a488f678 R08: ffff9a81bf6db780 R09: 0000000000000000 [236019.236492] R10: ffff9a81bf6df140 R11: fffff4b4f551fc00 R12: ffff9a583ea50800 [236019.236493] R13: ffff9a61b5290000 R14: ffff9a81a488f908 R15: ffff9a583ea50b28 [236019.236494] FS: 00007f418058c700(0000) GS:ffff9a81bf6c0000(0000) knlGS:0000000000000000 [236019.236495] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236019.236496] CR2: 00007f41828ed000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236019.236497] Call Trace: [236019.236499] [] queued_spin_lock_slowpath+0xb/0xf [236019.236502] [] _raw_spin_lock+0x20/0x30 [236019.236513] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236019.236520] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236019.236530] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236019.236539] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236019.236548] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236019.236555] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236019.236562] [] dx_probe+0xa2/0xa20 [ldiskfs] [236019.236571] [] ? __ldiskfs_get_inode_loc+0xe3/0x3c0 [ldiskfs] [236019.236577] [] ? ldiskfs_xattr_find_entry+0x9f/0x130 [ldiskfs] [236019.236584] [] ldiskfs_htree_fill_tree+0x199/0x2f0 [ldiskfs] [236019.236590] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236019.236598] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236019.236600] [] ? generic_getxattr+0x52/0x70 [236019.236607] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236019.236614] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236019.236624] [] lod_it_load+0x27/0x90 [lod] [236019.236648] [] dt_index_walk+0xf8/0x430 [obdclass] [236019.236658] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236019.236666] [] mdd_readpage+0x25f/0x5a0 [mdd] [236019.236678] [] mdt_readpage+0x63a/0x880 [mdt] [236019.236721] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236019.236759] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236019.236766] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236019.236802] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236019.236854] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236019.236864] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236019.236867] CPU: 0 PID: 70180 Comm: mdt_rdpg00_011 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236019.236867] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236019.236868] task: ffff9a7000f3b0c0 ti: ffff9a6d3f2dc000 task.ti: ffff9a6d3f2dc000 [236019.236872] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236019.236873] RSP: 0018:ffff9a6d3f2df6c0 EFLAGS: 00000246 [236019.236874] RAX: 0000000000000000 RBX: ffff9a61bef9ab80 RCX: 0000000000010000 [236019.236875] RDX: ffff9a61bf09b780 RSI: 0000000001410101 RDI: ffff9a7b0c800c80 [236019.236876] RBP: ffff9a6d3f2df6c0 R08: ffff9a61bee1b780 R09: 0000000000000000 [236019.236876] R10: ffff9a61bee1f140 R11: fffff4b46a5d7400 R12: ffff9a7000f3b758 [236019.236877] R13: 000000013f2df638 R14: ffff9a61bef80000 R15: ffffffffb3e2a59e [236019.236879] FS: 00007fe2c4310900(0000) GS:ffff9a61bee00000(0000) knlGS:0000000000000000 [236019.236880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236019.236881] CR2: 00007fa5b4119028 CR3: 000000102960c000 CR4: 00000000003407f0 [236019.236881] Call Trace: [236019.236884] [] queued_spin_lock_slowpath+0xb/0xf [236019.236887] [] _raw_spin_lock+0x20/0x30 [236019.236899] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236019.236907] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236019.236917] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236019.236919] [] ? __brelse+0x3d/0x50 [236019.236928] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236019.236937] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236019.236939] [] ? __find_get_block+0xbc/0x120 [236019.236946] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236019.236953] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236019.236960] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236019.236963] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236019.236969] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236019.236977] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236019.236978] [] ? generic_getxattr+0x52/0x70 [236019.236985] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236019.236992] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236019.237003] [] lod_it_load+0x27/0x90 [lod] [236019.237028] [] dt_index_walk+0xf8/0x430 [obdclass] [236019.237038] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236019.237046] [] mdd_readpage+0x25f/0x5a0 [mdd] [236019.237058] [] mdt_readpage+0x63a/0x880 [mdt] [236019.237105] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236019.237143] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236019.237150] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236019.237185] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236019.237219] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236019.237221] [] ? __wake_up+0x44/0x50 [236019.237255] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236019.237257] [] ? __schedule+0x42a/0x860 [236019.237290] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236019.237292] [] kthread+0xd1/0xe0 [236019.237293] [] ? insert_kthread_work+0x40/0x40 [236019.237296] [] ret_from_fork_nospec_begin+0xe/0x21 [236019.237297] [] ? insert_kthread_work+0x40/0x40 [236019.237316] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236019.270447] NMI watchdog: BUG: soft lockup - CPU#18 stuck for 23s! [mdt_rdpg02_013:68185] [236019.270490] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236019.270506] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236019.270509] CPU: 18 PID: 68185 Comm: mdt_rdpg02_013 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236019.270509] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236019.270511] task: ffff9a7000f3d140 ti: ffff9a71a2b08000 task.ti: ffff9a71a2b08000 [236019.270518] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236019.270520] RSP: 0018:ffff9a71a2b0b6c0 EFLAGS: 00000246 [236019.270521] RAX: 0000000000000000 RBX: ffff9a81bf65ab80 RCX: 0000000000910000 [236019.270521] RDX: ffff9a81bf6db780 RSI: 0000000000710101 RDI: ffff9a7b0c800c80 [236019.270522] RBP: ffff9a71a2b0b6c0 R08: ffff9a81bf71b780 R09: 0000000000000000 [236019.270524] R10: ffff9a81bf71f140 R11: fffff4b4d6f11800 R12: ffff9a7000f3d7d8 [236019.270524] R13: 00000001a2b0b638 R14: ffff9a81bf640000 R15: ffffffffb3e2a59e [236019.270526] FS: 00007fe2c4310900(0000) GS:ffff9a81bf700000(0000) knlGS:0000000000000000 [236019.270527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236019.270527] CR2: 00007fd90a456000 CR3: 000000102960c000 CR4: 00000000003407e0 [236019.270528] Call Trace: [236019.270534] [] queued_spin_lock_slowpath+0xb/0xf [236019.270538] [] _raw_spin_lock+0x20/0x30 [236019.270553] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236019.270561] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236019.270569] [] ? dm_old_request_fn+0xcc/0x210 [dm_mod] [236019.270573] [] ? iova_rcache_get+0xba/0x140 [236019.270584] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236019.270588] [] ? __brelse+0x3d/0x50 [236019.270598] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [236019.270608] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [236019.270610] [] ? __find_get_block+0xbc/0x120 [236019.270617] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [236019.270624] [] htree_dirblock_to_tree+0x40/0x190 [ldiskfs] [236019.270631] [] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs] [236019.270636] [] ? kmem_cache_alloc_trace+0x1d6/0x200 [236019.270641] [] ldiskfs_readdir+0x61c/0x850 [ldiskfs] [236019.270649] [] ? osd_object_alloc+0x360/0x360 [osd_ldiskfs] [236019.270652] [] ? generic_getxattr+0x52/0x70 [236019.270659] [] osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [236019.270666] [] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [236019.270678] [] lod_it_load+0x27/0x90 [lod] [236019.270710] [] dt_index_walk+0xf8/0x430 [obdclass] [236019.270722] [] ? mdd_object_lock+0xe0/0xe0 [mdd] [236019.270731] [] mdd_readpage+0x25f/0x5a0 [mdd] [236019.270743] [] mdt_readpage+0x63a/0x880 [mdt] [236019.270801] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236019.270838] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236019.270847] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236019.270882] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236019.270916] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236019.270921] [] ? __wake_up+0x44/0x50 [236019.270954] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236019.270987] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236019.270992] [] kthread+0xd1/0xe0 [236019.270993] [] ? insert_kthread_work+0x40/0x40 [236019.270996] [] ret_from_fork_nospec_begin+0xe/0x21 [236019.270998] [] ? insert_kthread_work+0x40/0x40 [236019.271016] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236019.467452] NMI watchdog: BUG: soft lockup - CPU#41 stuck for 23s! [mdt_io01_026:67991] [236019.467481] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236019.467491] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236019.467493] CPU: 41 PID: 67991 Comm: mdt_io01_026 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236019.467493] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236019.467494] task: ffff9a612bbed140 ti: ffff9a6097e40000 task.ti: ffff9a6097e40000 [236019.467498] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [236019.467499] RSP: 0018:ffff9a6097e43800 EFLAGS: 00000246 [236019.467499] RAX: 0000000000000000 RBX: ffff9a86afe22d60 RCX: 0000000001490000 [236019.467500] RDX: ffff9a81bf71b780 RSI: 0000000000910101 RDI: ffff9a7b0c800c80 [236019.467501] RBP: ffff9a6097e43800 R08: ffff9a71bf89b780 R09: 0000000000000000 [236019.467501] R10: ffff9a71bf89f140 R11: fffff4b4c72f9400 R12: 0000000000000000 [236019.467502] R13: ffff9a6097e437a0 R14: ffff9a86afe22ac8 R15: 0000000000000000 [236019.467503] FS: 00007f32e0150700(0000) GS:ffff9a71bf880000(0000) knlGS:0000000000000000 [236019.467504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236019.467505] CR2: 00007f32e0224000 CR3: 0000003c68610000 CR4: 00000000003407e0 [236019.467506] Call Trace: [236019.467508] [] queued_spin_lock_slowpath+0xb/0xf [236019.467511] [] _raw_spin_lock+0x20/0x30 [236019.467522] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236019.467529] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236019.467532] [] ? ktime_get_ts64+0x52/0xf0 [236019.467534] [] ? ktime_get+0x52/0xe0 [236019.467541] [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] [236019.467551] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236019.467560] [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] [236019.467561] [] ? ktime_get_ts64+0x52/0xf0 [236019.467574] [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] [236019.467584] [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] [236019.467606] [] mdt_obd_preprw+0x65b/0x10a0 [mdt] [236019.467648] [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [236019.467650] [] ? __slab_free+0x81/0x2f0 [236019.467654] [] ? update_curr+0x14c/0x1e0 [236019.467655] [] ? __enqueue_entity+0x78/0x80 [236019.467656] [] ? enqueue_entity+0x2ef/0xbe0 [236019.467696] [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] [236019.467735] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236019.467771] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236019.467778] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236019.467813] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236019.467846] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236019.467848] [] ? __wake_up+0x44/0x50 [236019.467882] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236019.467915] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236019.467917] [] kthread+0xd1/0xe0 [236019.467918] [] ? insert_kthread_work+0x40/0x40 [236019.467920] [] ret_from_fork_nospec_begin+0xe/0x21 [236019.467922] [] ? insert_kthread_work+0x40/0x40 [236019.467941] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236021.170367] Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound). [236021.170373] Lustre: 68136:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=3 reqQ=0 recA=9, svcEst=2, delay=29291 [236021.170377] Lustre: 68136:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message [236021.170386] Lustre: 68027:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-24s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff9a5dc759e850 x1648681093884064/t0(0) o3->23e532ac-5cec-15dc-e56e-ef1fad067124@10.8.0.82@o2ib6:195/0 lens 488/440 e 0 to 0 dl 1573068365 ref 2 fl Interpret:/0/0 rc 0/0 [236021.170581] Lustre: 68130:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 25s req@ffff9a7ad41d9b00 x1649307205522784/t0(0) o35->8d232f07-b6ab-bc70-4dd8-277e82f65db5@10.9.107.58@o2ib4:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [236021.170614] Lustre: 68130:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 1 previous similar message [236021.170739] LustreError: 68144:0:(ldlm_lib.c:3205:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff9a60928e0480 x1648442534785616/t0(0) o37->98c710cf-a183-35fe-d60d-8494e153f1c3@10.8.21.13@o2ib6:232/0 lens 448/440 e 0 to 0 dl 1573068402 ref 1 fl Interpret:/0/0 rc 0/0 [236021.170927] LustreError: 68132:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.0.61@o2ib4: deadline 7:23s ago req@ffff9a5c174df080 x1648775522767792/t0(0) o35->51390574-c509-f8c2-383b-446baae03d6d@10.9.0.61@o2ib4:196/0 lens 392/0 e 0 to 0 dl 1573068366 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [236021.170940] Lustre: 68132:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (7:23s); client may timeout. req@ffff9a5c174df080 x1648775522767792/t0(0) o35->51390574-c509-f8c2-383b-446baae03d6d@10.9.0.61@o2ib4:196/0 lens 392/0 e 0 to 0 dl 1573068366 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [236021.172512] LustreError: 67981:0:(ldlm_lib.c:3246:target_bulk_io()) @@@ timeout on bulk READ after -24+24s req@ffff9a5dc759e850 x1648681093884064/t0(0) o3->23e532ac-5cec-15dc-e56e-ef1fad067124@10.8.0.82@o2ib6:195/0 lens 488/440 e 0 to 0 dl 1573068365 ref 1 fl Interpret:/0/0 rc 0/0 [236021.172514] LustreError: 68011:0:(ldlm_lib.c:3246:target_bulk_io()) @@@ timeout on bulk READ after -24+24s req@ffff9a6068e29050 x1648654527542416/t0(0) o3->dbef609c-d4fa-502e-524c-cd13762b4747@10.9.0.63@o2ib4:195/0 lens 488/440 e 0 to 0 dl 1573068365 ref 1 fl Interpret:/0/0 rc 0/0 [236021.172537] Lustre: fir-MDT0002: Bulk IO read error with dbef609c-d4fa-502e-524c-cd13762b4747 (at 10.9.0.63@o2ib4), client will retry: rc -110 [236021.173522] LustreError: 68141:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a76eecd4050 x1649400028357088/t0(0) o4->83be24ed-ef36-c298-4c93-73347c93a212@10.9.106.26@o2ib4:249/0 lens 488/448 e 2 to 0 dl 1573068419 ref 1 fl Interpret:/0/0 rc 0/0 [236021.173526] LustreError: 68141:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 2 previous similar messages [236021.173562] Lustre: fir-MDT0002: Bulk IO write error with 83be24ed-ef36-c298-4c93-73347c93a212 (at 10.9.106.26@o2ib4), client will retry: rc = -110 [236021.173564] Lustre: Skipped 2 previous similar messages [236021.181293] LNetError: 67096:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [236021.181298] LustreError: 67096:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff9a561ec26c00 [236021.182880] Lustre: fir-MDT0002: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID [236021.185371] LNet: 67085:0:(o2iblnd_cb.c:1510:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.204@o2ib7: connected [236021.192189] LustreError: 67697:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5db3359c6d700 vs. last_xid 5db3359c6e77f req@ffff9a61223e2850 x1648388479571712/t0(0) o101->54328142-7f6b-6ea1-f253-6ef62378642f@10.9.102.53@o2ib4:262/0 lens 1776/0 e 0 to 0 dl 1573068432 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [236023.313551] NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s! [mdt_io03_035:92904] [236023.313592] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236023.313606] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236023.313609] CPU: 23 PID: 92904 Comm: mdt_io03_035 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236023.313609] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236023.313611] task: ffff9a77027b2080 ti: ffff9a790a6ac000 task.ti: ffff9a790a6ac000 [236023.313617] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [236023.313618] RSP: 0018:ffff9a790a6af8e8 EFLAGS: 00000246 [236023.313619] RAX: 0000000000000000 RBX: ffff9a790a6af898 RCX: 0000000000b90000 [236023.313619] RDX: ffff9a71bf89b780 RSI: 0000000001490101 RDI: ffff9a7b0c800c80 [236023.313620] RBP: ffff9a790a6af8e8 R08: ffff9a91ff55b780 R09: 0000000000000000 [236023.313621] R10: 0000000000000000 R11: ffff9a8fbeb44000 R12: 0000000000000001 [236023.313621] R13: 0000000000000003 R14: 0000000000000246 R15: ffff9a8603326a38 [236023.313623] FS: 00007faddfee7740(0000) GS:ffff9a91ff540000(0000) knlGS:0000000000000000 [236023.313624] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236023.313624] CR2: 00007f3902c350b8 CR3: 0000002034eae000 CR4: 00000000003407e0 [236023.313625] Call Trace: [236023.313630] [] queued_spin_lock_slowpath+0xb/0xf [236023.313633] [] _raw_spin_lock+0x20/0x30 [236023.313647] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236023.313658] [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] [236023.313668] [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] [236023.313672] [] ? ktime_get_ts64+0x52/0xf0 [236023.313685] [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] [236023.313694] [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] [236023.313715] [] mdt_obd_preprw+0x65b/0x10a0 [mdt] [236023.313772] [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [236023.313776] [] ? __slab_free+0x81/0x2f0 [236023.313778] [] ? __enqueue_entity+0x78/0x80 [236023.313815] [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] [236023.313851] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236023.313885] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236023.313891] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236023.313924] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236023.313956] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236023.313960] [] ? __wake_up+0x44/0x50 [236023.313992] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236023.314024] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236023.314028] [] kthread+0xd1/0xe0 [236023.314029] [] ? insert_kthread_work+0x40/0x40 [236023.314032] [] ret_from_fork_nospec_begin+0xe/0x21 [236023.314034] [] ? insert_kthread_work+0x40/0x40 [236023.314051] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [236026.415645] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [236026.415649] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [236026.415653] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.203@o2ib7 (5): c: 6, oc: 0, rc: 8 [236026.415655] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 2 previous similar messages [236027.132647] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [mdt_io02_034:92903] [236027.132693] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [236027.132709] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [236027.132712] CPU: 2 PID: 92903 Comm: mdt_io02_034 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [236027.132713] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [236027.132714] task: ffff9a77027b1040 ti: ffff9a77a4988000 task.ti: ffff9a77a4988000 [236027.132723] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [236027.132724] RSP: 0018:ffff9a77a498b800 EFLAGS: 00000246 [236027.132725] RAX: 0000000000000000 RBX: ffff9a638a4a1440 RCX: 0000000000110000 [236027.132726] RDX: ffff9a91ff55b780 RSI: 0000000000b90101 RDI: ffff9a7b0c800c80 [236027.132726] RBP: ffff9a77a498b800 R08: ffff9a81bf61b780 R09: 0000000000000000 [236027.132727] R10: ffff9a81bf61f140 R11: fffff4b4e6dc4800 R12: 0000000000000000 [236027.132728] R13: ffff9a77a498b7a0 R14: ffff9a638a4a11a8 R15: 0000000000000000 [236027.132729] FS: 00007f915bb0f740(0000) GS:ffff9a81bf600000(0000) knlGS:0000000000000000 [236027.132730] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236027.132731] CR2: 00007f41828ed000 CR3: 00000030341f0000 CR4: 00000000003407e0 [236027.132732] Call Trace: [236027.132738] [] queued_spin_lock_slowpath+0xb/0xf [236027.132743] [] _raw_spin_lock+0x20/0x30 [236027.132758] [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [236027.132766] [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [236027.132770] [] ? ktime_get_ts64+0x52/0xf0 [236027.132771] [] ? ktime_get+0x52/0xe0 [236027.132781] [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] [236027.132793] [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [236027.132804] [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] [236027.132806] [] ? ktime_get_ts64+0x52/0xf0 [236027.132819] [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] [236027.132831] [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] [236027.132851] [] mdt_obd_preprw+0x65b/0x10a0 [mdt] [236027.132916] [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [236027.132922] [] ? ___slab_alloc+0x209/0x4f0 [236027.132957] [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [236027.132963] [] ? __enqueue_entity+0x78/0x80 [236027.133003] [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] [236027.133042] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [236027.133079] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [236027.133086] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [236027.133121] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [236027.133155] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [236027.133159] [] ? __wake_up+0x44/0x50 [236027.133193] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236027.133227] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236027.133231] [] kthread+0xd1/0xe0 [236027.133233] [] ? insert_kthread_work+0x40/0x40 [236027.133236] [] ret_from_fork_nospec_begin+0xe/0x21 [236027.133238] [] ? insert_kthread_work+0x40/0x40 [236027.133257] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [236028.202702] Lustre: fir-OST0049-osc-MDT0002: Connection to fir-OST0049 (at 10.0.10.114@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [236028.202704] Lustre: Skipped 1 previous similar message [236028.204848] Lustre: fir-MDT0002: Client 3a7e0e42-33db-67bb-9fc0-74e80f2686d6 (at 10.9.110.28@o2ib4) reconnecting [236028.204850] Lustre: Skipped 565 previous similar messages [236030.840582] [] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] [236030.847850] [] ? ptlrpc_at_add_timed+0xe5/0x230 [ptlrpc] [236030.854932] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [236030.861321] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [236030.868801] [] kthread+0xd1/0xe0 [236030.873765] [] ? insert_kthread_work+0x40/0x40 [236030.879948] [] ret_from_fork_nospec_begin+0xe/0x21 [236030.886473] [] ? insert_kthread_work+0x40/0x40 [236030.892649] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf b4 b4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [236030.913436] Lustre: 67148:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1573068390/real 0] req@ffff9a59a234b600 x1649330524357200/t0(0) o13->fir-OST001d-osc-MDT0002@10.0.10.106@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1573068397 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [236030.913442] Lustre: 73844:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 10s req@ffff9a7886e8bf00 x1648544254345984/t0(0) o103->0c2e3585-4533-41f8-c50c-67b518a2e12d@10.8.21.30@o2ib6:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 [236030.913444] Lustre: 73844:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 87 previous similar messages [236030.913510] Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound). [236030.913512] Lustre: Skipped 9 previous similar messages [236030.913516] Lustre: 68113:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=0 reqQ=0 recA=4, svcEst=31, delay=8722 [236030.913519] Lustre: 68113:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 8 previous similar messages [236030.913689] LustreError: 74348:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.21.30@o2ib6: deadline 6:4s ago req@ffff9a7886e8bf00 x1648544254345984/t0(0) o103->0c2e3585-4533-41f8-c50c-67b518a2e12d@10.8.21.30@o2ib6:225/0 lens 328/0 e 0 to 0 dl 1573068395 ref 2 fl Interpret:/2/ffffffff rc 0/-1 [236030.913691] LustreError: 74348:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 69 previous similar messages [236030.913699] Lustre: 74348:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:4s); client may timeout. req@ffff9a7886e8bf00 x1648544254345984/t0(0) o103->0c2e3585-4533-41f8-c50c-67b518a2e12d@10.8.21.30@o2ib6:225/0 lens 328/0 e 0 to 0 dl 1573068395 ref 2 fl Interpret:/2/ffffffff rc 0/-1 [236030.913701] Lustre: 74348:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 75 previous similar messages [236030.913718] Lustre: 74333:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-4s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff9a7a30ef5100 x1648633442000976/t0(0) o103->6c352811-2bb2-62d5-2b35-8b7ce4a5ffcc@10.8.21.7@o2ib6:225/0 lens 328/0 e 0 to 0 dl 1573068395 ref 2 fl New:/2/ffffffff rc 0/-1 [236030.913721] Lustre: 74333:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 75 previous similar messages [236030.913754] LNetError: 67095:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.109@o2ib7 added to recovery queue. Health = 900 [236030.913757] LNetError: 67095:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 1 previous similar message [236030.913774] LustreError: 67914:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9a750944e850 x1649400028356384/t0(0) o4->83be24ed-ef36-c298-4c93-73347c93a212@10.9.106.26@o2ib4:249/0 lens 488/448 e 2 to 0 dl 1573068419 ref 1 fl Interpret:/0/0 rc 0/0 [236030.913776] LustreError: 67914:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 20 previous similar messages [236030.913814] Lustre: fir-MDT0002: Bulk IO write error with 83be24ed-ef36-c298-4c93-73347c93a212 (at 10.9.106.26@o2ib4), client will retry: rc = -110 [236030.913815] Lustre: Skipped 6 previous similar messages [236030.913875] Lustre: fir-MDT0002: Connection restored to e1f8ef66-a5ef-4af2-9bdc-89d46f24c521 (at 10.9.109.10@o2ib4) [236030.913879] Lustre: Skipped 533 previous similar messages [236030.913913] LustreError: 67876:0:(ldlm_lib.c:3205:target_bulk_io()) @@@ bulk READ failed: rc -107 req@ffff9a5c17792880 x1648543535945872/t0(0) o37->a71a7996-5901-94d9-ec3d-82b7ac6e7689@10.8.21.26@o2ib6:232/0 lens 448/440 e 0 to 0 dl 1573068402 ref 1 fl Interpret:/0/0 rc 0/0 [236030.913915] LustreError: 67876:0:(ldlm_lib.c:3205:target_bulk_io()) Skipped 19 previous similar messages [236030.914417] Lustre: fir-MDT0002: Received new LWP connection from 10.0.10.51@o2ib7, removing former export from same NID [236030.914921] LNet: 67094:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.51@o2ib7 [236030.914924] LNet: 67094:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) Skipped 1 previous similar message [236030.915787] LustreError: 67776:0:(ldlm_lib.c:3246:target_bulk_io()) @@@ timeout on bulk READ after -3+3s req@ffff9a76e23fb600 x1648544256512176/t0(0) o37->0c2e3585-4533-41f8-c50c-67b518a2e12d@10.8.21.30@o2ib6:226/0 lens 448/440 e 0 to 0 dl 1573068396 ref 1 fl Interpret:/0/0 rc 0/0 [236030.917507] LustreError: 67228:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5db63f1770af0 vs. last_xid 5db63f18bb68f req@ffff9a73bec76300 x1648597182909168/t0(0) o37->0708c07a-be3a-4408-fc7c-6280de2b71dc@10.8.21.15@o2ib6:262/0 lens 448/0 e 0 to 0 dl 1573068432 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [236031.341223] Lustre: 67148:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 70 previous similar messages [236033.257350] Lustre: fir-MDT0002: Received new LWP connection from 10.0.10.54@o2ib7, removing former export from same NID [236044.643242] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.21.13@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [236068.799925] Lustre: fir-MDT0002: Client fir-MDT0002-lwp-OST0038_UUID (at 10.0.10.109@o2ib7) reconnecting [236068.799926] Lustre: fir-MDT0002: Client fir-MDT0002-lwp-OST003a_UUID (at 10.0.10.109@o2ib7) reconnecting [236068.799927] Lustre: fir-MDT0002: Client fir-MDT0002-lwp-OST0034_UUID (at 10.0.10.109@o2ib7) reconnecting [236068.799928] Lustre: fir-MDT0002: Client fir-MDT0002-lwp-OST0030_UUID (at 10.0.10.109@o2ib7) reconnecting [236068.799929] Lustre: Skipped 28 previous similar messages [236068.799932] Lustre: Skipped 28 previous similar messages [236068.799933] Lustre: Skipped 28 previous similar messages [236068.799946] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.109@o2ib7) [236068.799947] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.109@o2ib7) [236068.799948] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.109@o2ib7) [236068.799948] Lustre: Skipped 79 previous similar messages [236068.799949] Lustre: Skipped 79 previous similar messages [236068.799949] Lustre: Skipped 79 previous similar messages [236068.892874] Lustre: Skipped 1 previous similar message [236073.555801] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.21.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [236081.512300] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [236109.199882] Lustre: fir-MDT0002: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID [236173.837041] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.44@o2ib4) [236173.844438] Lustre: Skipped 346 previous similar messages [236476.786966] Lustre: fir-MDT0002: Connection restored to d9e8d1d1-af07-ce57-473c-319ce9637cb5 (at 10.9.116.3@o2ib4) [236804.888573] Lustre: fir-MDT0002: Connection restored to (at 10.8.17.16@o2ib6) [236804.895908] Lustre: Skipped 3 previous similar messages [237333.735982] Lustre: fir-MDT0002: Connection restored to 37cc22e4-119d-a221-d45d-efe10a9d8f35 (at 10.8.19.2@o2ib6) [237333.746331] Lustre: Skipped 4 previous similar messages [238388.951539] LNetError: 67090:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [238405.352285] LNetError: 67088:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [238423.809157] Lustre: fir-MDT0002: Client b5be2f5f-0f09-196f-7061-da3a3aa7cecb (at 10.8.20.31@o2ib6) reconnecting [238423.819339] Lustre: Skipped 340 previous similar messages [238423.824846] Lustre: fir-MDT0002: Connection restored to b5be2f5f-0f09-196f-7061-da3a3aa7cecb (at 10.8.20.31@o2ib6) [238445.382469] Lustre: fir-MDT0002: Client 73960737-dbdc-9b6d-d342-4b8c815126a6 (at 10.8.28.3@o2ib6) reconnecting [238445.392578] Lustre: Skipped 1 previous similar message [238459.249646] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.20.31@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [238518.387149] Lustre: fir-MDT0002: Client b5be2f5f-0f09-196f-7061-da3a3aa7cecb (at 10.8.20.31@o2ib6) reconnecting [238524.115904] LNetError: 67086:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [238532.440640] LNetError: 67094:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [238532.453273] LNetError: 67094:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 1 previous similar message [238539.642455] LNetError: 67091:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [238539.655057] LNetError: 67091:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 1 previous similar message [238552.011109] Lustre: 67936:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1573070913/real 0] req@ffff9a5463264c80 x1649330533783120/t0(0) o104->fir-MDT0002@10.8.27.34@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573070920 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [238552.037691] Lustre: 67936:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [238661.642915] Lustre: fir-MDT0002: haven't heard from client b9315075-1ac1-90b5-e28c-c8b5bf0c9308 (at 10.9.103.56@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61be4c1800, cur 1573071030 expire 1573070880 last 1573070803 [238661.664805] Lustre: Skipped 3 previous similar messages [238814.646782] Lustre: fir-MDT0002: haven't heard from client 5fe84cf1-4f19-d7cc-c107-724967299ec4 (at 10.8.21.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b8f2c400, cur 1573071183 expire 1573071033 last 1573070956 [238814.668596] Lustre: Skipped 10 previous similar messages [238890.648807] Lustre: fir-MDT0002: haven't heard from client fac8e8bf-d79f-581a-5782-e8d4e37525ac (at 10.9.104.25@o2ib4) in 188 seconds. I think it's dead, and I am evicting it. exp ffff9a61a9a29c00, cur 1573071259 expire 1573071109 last 1573071071 [238890.670683] Lustre: Skipped 2 previous similar messages [239456.099686] Lustre: fir-MDT0002: Client ba7d4754-732a-6c3c-9b97-094af9e08a5e (at 10.8.21.25@o2ib6) reconnecting [239456.109874] Lustre: Skipped 4 previous similar messages [239456.115219] Lustre: fir-MDT0002: Connection restored to (at 10.8.21.25@o2ib6) [239456.122542] Lustre: Skipped 7 previous similar messages [239481.018788] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.21.25@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [239481.036165] LustreError: Skipped 1 previous similar message [239548.056231] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.21.25@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [239573.117151] Lustre: fir-MDT0002: Client ba7d4754-732a-6c3c-9b97-094af9e08a5e (at 10.8.21.25@o2ib6) reconnecting [240090.474657] Lustre: fir-MDT0002: Connection restored to 544105de-8833-e38a-4ec5-601f76f65e5f (at 10.9.103.43@o2ib4) [240090.485175] Lustre: Skipped 27 previous similar messages [240799.094196] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.25@o2ib4) [240799.101601] Lustre: Skipped 46 previous similar messages [241461.842153] Lustre: fir-MDT0002: Connection restored to cf57dede-7c30-d907-e5cf-b33746b40c8f (at 10.9.102.58@o2ib4) [242439.218895] Lustre: fir-MDT0002: Connection restored to 169fde9b-f381-df8d-fc89-8ebd44760f1c (at 10.9.102.72@o2ib4) [242439.229415] Lustre: Skipped 1 previous similar message [243744.772924] Lustre: fir-MDT0002: haven't heard from client 7dca925c-5837-f294-ea73-093b77ab6860 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a75eeca7800, cur 1573076113 expire 1573075963 last 1573075886 [243873.160003] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [243873.170447] Lustre: Skipped 1 previous similar message [244067.479715] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [244099.781626] Lustre: fir-MDT0002: haven't heard from client 2c7938f6-6d0b-c78d-8597-f5166fcf7214 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a59c1482400, cur 1573076468 expire 1573076318 last 1573076241 [244415.562786] Lustre: fir-MDT0002: Connection restored to 62482ffd-24d7-d281-f760-11482363fdfd (at 10.9.102.21@o2ib4) [246326.840338] Lustre: fir-MDT0002: haven't heard from client c4c574ed-6e8c-3afd-ea20-1739543069a9 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6df535e000, cur 1573078695 expire 1573078545 last 1573078468 [246519.434593] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [246519.445026] Lustre: Skipped 1 previous similar message [247247.506099] Lustre: fir-MDT0002: Connection restored to e1ce92ec-7e01-7202-260a-33faccfff07f (at 10.9.102.59@o2ib4) [247299.832092] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.70@o2ib4) [247356.091434] Lustre: fir-MDT0002: Connection restored to 12b7e963-e16b-4d9e-d3f5-87bea9dc2355 (at 10.8.31.7@o2ib6) [248570.440866] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [248617.899945] Lustre: fir-MDT0002: haven't heard from client f1843802-b276-1bdc-451d-5f53e14fec19 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5182487400, cur 1573080986 expire 1573080836 last 1573080759 [248885.906145] Lustre: fir-MDT0002: haven't heard from client c2248554-3f66-c76f-6693-0c5891f02c88 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a51a00c9400, cur 1573081254 expire 1573081104 last 1573081027 [249095.815923] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [249287.985992] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [249321.919339] Lustre: fir-MDT0002: haven't heard from client 52fff601-8b43-145e-115c-f23abd250223 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5cafc37000, cur 1573081690 expire 1573081540 last 1573081463 [249404.322991] LNet: 67092:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.201@o2ib7 [249404.331784] LNet: 67092:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) Skipped 1 previous similar message [249405.108154] Lustre: 67628:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573081766/real 1573081766] req@ffff9a75dffdf980 x1649330584349616/t0(0) o104->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573081773 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [249405.129483] Lustre: fir-MDT0002: Client 23e532ac-5cec-15dc-e56e-ef1fad067124 (at 10.8.0.82@o2ib6) reconnecting [249405.129502] Lustre: fir-MDT0002: Connection restored to (at 10.8.0.82@o2ib6) [249407.140876] Lustre: fir-MDT0002: Client 2d5c053e-695e-1a68-298c-a84f7f405d94 (at 10.8.8.24@o2ib6) reconnecting [249407.150972] Lustre: Skipped 11 previous similar messages [249412.013735] Lustre: fir-MDT0002: Client 172ec88c-3454-1411-8e15-a9b5202e9e30 (at 10.8.21.8@o2ib6) reconnecting [249412.023826] Lustre: Skipped 6 previous similar messages [249412.152331] Lustre: 67628:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573081773/real 1573081773] req@ffff9a75dffdf980 x1649330584349616/t0(0) o104->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573081780 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [249412.179663] Lustre: 67628:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [249420.073801] Lustre: fir-MDT0002: Client 3d533f49-7625-62e4-5877-df77da08f33c (at 10.8.24.27@o2ib6) reconnecting [249420.083974] Lustre: Skipped 48 previous similar messages [249423.917728] Lustre: fir-MDT0002: Connection restored to 4ae1953c-c5de-651a-c222-99cb1d82d019 (at 10.8.7.6@o2ib6) [249423.927991] Lustre: Skipped 94 previous similar messages [249426.272589] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.30.22@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [249428.041000] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.31.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [249429.726548] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.24.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [249431.775785] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.21.34@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [249431.793151] LustreError: Skipped 1 previous similar message [249432.076854] Lustre: 67937:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573081793/real 1573081793] req@ffff9a60928e0900 x1649330584367024/t0(0) o106->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573081800 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [249432.104233] Lustre: 67937:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [249435.794524] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.8.30@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [249435.811803] LustreError: Skipped 7 previous similar messages [249437.370890] Lustre: fir-MDT0002: Client 391dd594-5f4c-e8ee-d84f-c1c10aa53998 (at 10.8.28.1@o2ib6) reconnecting [249437.380979] Lustre: Skipped 72 previous similar messages [249439.086034] Lustre: 67784:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573081800/real 1573081800] req@ffff9a51a7ca8000 x1649330584367056/t0(0) o106->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573081807 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [249444.105432] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.19.1@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [249444.122714] LustreError: Skipped 20 previous similar messages [249453.137394] Lustre: 67784:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573081814/real 1573081814] req@ffff9a71b7359200 x1649330584383360/t0(0) o106->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573081821 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [249453.164731] Lustre: 67784:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [249460.676445] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.7.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [249460.693641] LustreError: Skipped 42 previous similar messages [249462.316994] Lustre: fir-MDT0002: Connection restored to a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) [249462.327340] Lustre: Skipped 84 previous similar messages [249469.401689] Lustre: fir-MDT0002: Client a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) reconnecting [249469.411775] Lustre: Skipped 49 previous similar messages [249472.602911] LustreError: 68030:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff9a6068e2c850 x1649069159553504/t0(0) o4->67360d0f-602d-e0fd-a763-b6dc0eec238b@10.8.27.35@o2ib6:101/0 lens 488/448 e 2 to 0 dl 1573081861 ref 1 fl Interpret:/0/0 rc 0/0 [249472.602914] LustreError: 68023:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff9a6068e2d050 x1649069159553440/t0(0) o4->67360d0f-602d-e0fd-a763-b6dc0eec238b@10.8.27.35@o2ib6:101/0 lens 488/448 e 2 to 0 dl 1573081861 ref 1 fl Interpret:/0/0 rc 0/0 [249472.602918] LustreError: 68023:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) Skipped 2 previous similar messages [249472.602925] LustreError: 68011:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(32768) req@ffff9a6044f89050 x1649347687821456/t0(0) o3->74bb7759-5b69-188d-1d68-d42ea52dd73e@10.8.22.9@o2ib6:93/0 lens 488/440 e 3 to 0 dl 1573081853 ref 1 fl Interpret:/0/0 rc 0/0 [249472.602933] Lustre: fir-MDT0002: Bulk IO write error with 67360d0f-602d-e0fd-a763-b6dc0eec238b (at 10.8.27.35@o2ib6), client will retry: rc = -110 [249472.602939] Lustre: fir-MDT0002: Bulk IO read error with 74bb7759-5b69-188d-1d68-d42ea52dd73e (at 10.8.22.9@o2ib6), client will retry: rc -110 [249472.602941] Lustre: Skipped 1 previous similar message [249472.718267] LustreError: 68030:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) Skipped 2 previous similar messages [249476.823002] Lustre: 67784:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573081837/real 1573081837] req@ffff9a71b7359b00 x1649330584412272/t0(0) o106->fir-MDT0002@10.8.27.35@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573081844 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [249476.850338] Lustre: 67784:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [249493.682432] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.7.15@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [249493.699731] LustreError: Skipped 68 previous similar messages [249533.655042] Lustre: fir-MDT0002: Client 000d6715-906a-fe00-99d9-1ba39760e7f7 (at 10.8.22.16@o2ib6) reconnecting [249533.665218] Lustre: Skipped 214 previous similar messages [249538.708616] Lustre: fir-MDT0002: Connection restored to 2d5c053e-695e-1a68-298c-a84f7f405d94 (at 10.8.8.24@o2ib6) [249538.718966] Lustre: Skipped 235 previous similar messages [249557.850546] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.24.6@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [249557.867828] LustreError: Skipped 286 previous similar messages [249599.925243] Lustre: fir-MDT0002: haven't heard from client aa5fd962-1a02-82a9-7be3-d88f6962947a (at 10.8.23.14@o2ib6) in 190 seconds. I think it's dead, and I am evicting it. exp ffff9a5187f58800, cur 1573081968 expire 1573081818 last 1573081778 [249824.631062] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [249824.641496] Lustre: Skipped 119 previous similar messages [249841.931724] Lustre: fir-MDT0002: haven't heard from client ae0e1624-e7fa-5941-8bed-3ed83a4c909c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7e07434000, cur 1573082210 expire 1573082060 last 1573081983 [249841.953514] Lustre: Skipped 1 previous similar message [249912.772277] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [249912.782537] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [249912.792792] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (69): c: 8, oc: 0, rc: 8 [249912.804958] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 6 previous similar messages [249950.773258] LNetError: 67085:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [249950.783517] LNetError: 67085:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (107): c: 8, oc: 0, rc: 8 [250044.669012] LNetError: 321:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.201@o2ib7 rejected: consumer defined fatal error [250044.680234] LNetError: 321:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 12 previous similar messages [250350.783640] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [250350.793903] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 38 previous similar messages [250651.792434] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [250952.800271] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [251016.716505] Lustre: fir-MDT0002: Connection restored to ad78ba27-aeb2-dbe1-db29-c0000baf9c0d (at 10.9.101.12@o2ib4) [251124.963800] Lustre: fir-MDT0002: haven't heard from client 7c7c6c55-ee0a-39dc-ced8-86854b97f795 (at 10.9.116.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91ba0d3c00, cur 1573083493 expire 1573083343 last 1573083266 [251175.602786] Lustre: fir-MDT0002: Connection restored to 2e7f010b-909f-57f0-4c10-1a8709f95140 (at 10.9.103.17@o2ib4) [251175.613310] Lustre: Skipped 1 previous similar message [251254.808148] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [251555.815013] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [251856.822903] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [251899.900969] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.17@o2ib4) [252157.830823] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [252459.838744] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 1 seconds [252732.052224] Lustre: fir-MDT0002: Connection restored to 8c089ecf-e780-5cb0-f98b-62bf79bae88b (at 10.9.102.18@o2ib4) [253013.809579] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.19@o2ib4) [253060.854655] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [253060.864916] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [253223.483427] Lustre: fir-MDT0002: Connection restored to (at 10.9.116.4@o2ib4) [253454.758833] Lustre: fir-MDT0002: Connection restored to ae261aa4-0204-6665-18ba-7ed4dcc197e1 (at 10.9.102.20@o2ib4) [253661.870646] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [253661.880904] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [253731.033504] Lustre: fir-MDT0002: haven't heard from client 6ce14f6e-2239-ec0d-20cd-0c33b259b570 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a84fe3a7c00, cur 1573086099 expire 1573085949 last 1573085872 [254253.047435] Lustre: fir-MDT0002: haven't heard from client e89d0864-ec67-c4fd-e7a2-ecf54e4ac209 (at 10.9.104.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7758cdbc00, cur 1573086621 expire 1573086471 last 1573086394 [254263.886737] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [255167.910879] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 1 seconds [255167.921138] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [255257.073296] Lustre: fir-MDT0002: haven't heard from client c9a4e4ca-4333-ab9e-b798-6368f07701ee (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7da3323c00, cur 1573087625 expire 1573087475 last 1573087398 [255257.095092] Lustre: Skipped 4 previous similar messages [255373.431479] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [255388.752972] Lustre: fir-MDT0002: Connection restored to f5b732ff-4959-4283-d29c-fcd8fac11c91 (at 10.9.113.1@o2ib4) [255422.109831] Lustre: fir-MDT0002: Connection restored to 37cc22e4-119d-a221-d45d-efe10a9d8f35 (at 10.8.19.2@o2ib6) [255561.463478] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.53@o2ib4) [255577.580905] Lustre: fir-MDT0002: Connection restored to 0a4833fd-948b-801d-e529-380337f0c2cd (at 10.9.108.30@o2ib4) [255724.740444] Lustre: fir-MDT0002: Connection restored to 0e67c056-6635-c70f-79c1-ba277ef85897 (at 10.9.107.17@o2ib4) [255768.926659] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [255768.936917] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [255785.798845] Lustre: fir-MDT0002: Connection restored to de003c45-7c58-3340-275a-412406d68c28 (at 10.8.30.35@o2ib6) [255880.985802] Lustre: fir-MDT0002: Connection restored to 831c1d15-0574-34d2-10fb-653f9f596824 (at 10.9.108.32@o2ib4) [256058.033334] Lustre: fir-MDT0002: Connection restored to 668eb028-82c2-c6e3-1d8f-48e15c9354b0 (at 10.9.104.26@o2ib4) [256058.043854] Lustre: Skipped 1 previous similar message [256371.942541] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [256371.952800] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [256973.958208] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 0 seconds [256973.968464] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [257576.974169] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 1 seconds [257576.984431] LNet: 67085:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [258784.497083] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.31@o2ib4) [258784.504488] Lustre: Skipped 4 previous similar messages [259001.909787] Lustre: fir-MDT0002: Connection restored to 14a7252d-059e-f55d-9bba-02f9b98a4298 (at 10.9.112.15@o2ib4) [260495.357706] Lustre: fir-MDT0002: Connection restored to ceabbfe7-ac2f-78e4-da38-20972008d508 (at 10.9.103.19@o2ib4) [260881.222052] Lustre: fir-MDT0002: haven't heard from client f5c43614-c89b-6832-89f7-4d8ce26ab295 (at 10.8.13.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a618b599c00, cur 1573093249 expire 1573093099 last 1573093022 [262068.439574] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.18@o2ib4) [262812.572180] Lustre: fir-MDT0002: Connection restored to a803fd37-971a-84b4-4111-c9057e8bb466 (at 10.9.103.13@o2ib4) [264346.026658] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.28@o2ib4) [264974.852761] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.68@o2ib4) [265126.825161] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [265148.334199] Lustre: fir-MDT0002: haven't heard from client 04cf6d47-b56c-45a6-cd39-90e365a8670c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7b8e1efc00, cur 1573097516 expire 1573097366 last 1573097289 [265187.253375] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.33@o2ib4) [265405.350646] Lustre: fir-MDT0002: haven't heard from client 4e1cf2d0-1983-8049-a8ac-338ac9bd000c (at 10.8.9.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a63927bc400, cur 1573097773 expire 1573097623 last 1573097546 [265749.305118] Lustre: fir-MDT0002: Connection restored to 0daf9bd4-938e-b64e-6198-3308ff9f82c0 (at 10.9.103.52@o2ib4) [265760.056507] Lustre: fir-MDT0002: Connection restored to 1b5008f0-3c80-3338-d2fb-ae2c35e247c2 (at 10.9.103.50@o2ib4) [265805.247446] Lustre: fir-MDT0002: Connection restored to 691c0288-af93-ee97-99a8-fb762d993d54 (at 10.9.103.14@o2ib4) [265805.257968] Lustre: Skipped 2 previous similar messages [266080.828903] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [266108.360121] Lustre: fir-MDT0002: haven't heard from client b85823d6-ca5b-9bbb-f63b-60e1f1f6c122 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a68eaa40400, cur 1573098476 expire 1573098326 last 1573098249 [266330.365487] Lustre: fir-MDT0002: haven't heard from client 3b18e546-51b7-d340-78d1-ef860e7a1da3 (at 10.9.106.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a7fce800, cur 1573098698 expire 1573098548 last 1573098471 [267144.107236] Lustre: fir-MDT0002: Connection restored to 3bf8d96a-eb18-163f-e988-cae49d49bd9b (at 10.9.104.3@o2ib4) [267148.889135] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.4@o2ib4) [267172.630575] Lustre: fir-MDT0002: Connection restored to 1381947c-e40c-dd9a-486b-3fe2b4d4fee2 (at 10.9.104.6@o2ib4) [267189.326066] Lustre: fir-MDT0002: Connection restored to 39a18e22-d2ec-e4be-c33a-c2a0f3672cb5 (at 10.9.102.15@o2ib4) [267226.330432] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.7@o2ib4) [267226.337750] Lustre: Skipped 7 previous similar messages [267839.020175] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.57@o2ib4) [267839.027581] Lustre: Skipped 3 previous similar messages [268422.428051] Lustre: fir-MDT0002: Connection restored to f5b732ff-4959-4283-d29c-fcd8fac11c91 (at 10.9.113.1@o2ib4) [268906.997891] Lustre: fir-MDT0002: Connection restored to 55bfcee1-1e97-417e-55f9-eff80c0f1d78 (at 10.9.103.25@o2ib4) [269947.467607] Lustre: fir-MDT0002: haven't heard from client bf1da82c-665b-e517-6b65-b961002223bb (at 10.9.108.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8c98b6f800, cur 1573102315 expire 1573102165 last 1573102088 [270611.474466] Lustre: fir-MDT0002: haven't heard from client bfefa7f9-ffb2-c1b9-bbcb-b312035daa83 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a747fced800, cur 1573102979 expire 1573102829 last 1573102752 [270713.276706] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.54@o2ib4) [271292.751978] Lustre: fir-MDT0002: Connection restored to bf1da82c-665b-e517-6b65-b961002223bb (at 10.9.108.2@o2ib4) [274073.593021] Lustre: fir-MDT0002: Connection restored to (at 10.9.114.5@o2ib4) [274357.709691] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.29@o2ib4) [276129.011291] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.31@o2ib4) [277009.648281] Lustre: fir-MDT0002: haven't heard from client 09cbef19-78b4-12a7-6dd6-4d898acd5eaf (at 10.9.103.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a7f80800, cur 1573109377 expire 1573109227 last 1573109150 [278451.413269] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.26@o2ib4) [278830.120013] Lustre: fir-MDT0002: Connection restored to 1225bda0-340e-0547-5edd-8ffaf61f5677 (at 10.9.103.15@o2ib4) [278831.382065] Lustre: fir-MDT0002: Connection restored to 09cbef19-78b4-12a7-6dd6-4d898acd5eaf (at 10.9.103.18@o2ib4) [278848.554997] Lustre: fir-MDT0002: Connection restored to a97a227c-9eb9-17ee-5bf9-08381e72d1e7 (at 10.9.103.10@o2ib4) [278860.708764] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.1@o2ib4) [279830.945611] Lustre: fir-MDT0002: Connection restored to c3c3c828-0658-8253-2e2b-6619a84333a7 (at 10.8.29.8@o2ib6) [279830.955982] Lustre: Skipped 2 previous similar messages [280770.415088] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.35@o2ib4) [280945.364970] Lustre: fir-MDT0002: Connection restored to 84648fdf-0ed8-b455-0f7d-9ac50e45ba9c (at 10.9.104.71@o2ib4) [282578.106651] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.12@o2ib4) [284817.939602] Lustre: fir-MDT0002: Connection restored to 6154eab8-119a-0972-9137-942b12876d35 (at 10.9.104.31@o2ib4) [284821.044171] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.30@o2ib4) [287625.593495] Lustre: fir-MDT0002: Connection restored to (at 10.9.114.5@o2ib4) [289138.911195] Lustre: fir-MDT0002: Connection restored to fa19a1bd-6cf2-93a3-9cc0-0e0a9491f3fd (at 10.9.102.26@o2ib4) [290190.023165] Lustre: fir-MDT0002: Connection restored to 16834498-f082-b8d6-0fed-822dab1a074a (at 10.8.26.35@o2ib6) [291427.250369] Lustre: fir-MDT0002: Connection restored to deb753af-3b02-0448-210e-a3c986ff2f59 (at 10.9.102.50@o2ib4) [291448.512783] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.51@o2ib4) [291457.352820] Lustre: fir-MDT0002: Connection restored to a2c68999-74a9-38eb-896f-997864ca175d (at 10.9.102.55@o2ib4) [291906.045713] Lustre: fir-MDT0002: haven't heard from client d4760e60-2984-faa1-747a-4901ddf94ab3 (at 10.9.102.49@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91abf74c00, cur 1573124273 expire 1573124123 last 1573124046 [291906.067635] Lustre: Skipped 4 previous similar messages [293679.250745] Lustre: fir-MDT0002: Connection restored to d4760e60-2984-faa1-747a-4901ddf94ab3 (at 10.9.102.49@o2ib4) [293679.261267] Lustre: Skipped 1 previous similar message [293751.073157] Lustre: fir-MDT0002: Connection restored to 54328142-7f6b-6ea1-f253-6ef62378642f (at 10.9.102.53@o2ib4) [293873.189229] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.32@o2ib4) [293968.082480] Lustre: fir-MDT0002: haven't heard from client ef7799b5-3274-30a3-b849-8ee207a32daa (at 10.9.115.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a771fe02c00, cur 1573126335 expire 1573126185 last 1573126108 [293972.982198] Lustre: fir-MDT0002: Connection restored to ba15a10b-a95d-133b-f873-39f741c8accb (at 10.9.115.11@o2ib4) [296576.145296] Lustre: fir-MDT0002: haven't heard from client efa841f1-8cbc-8f62-0b65-4658ef1a3b8e (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61a9a2a000, cur 1573128943 expire 1573128793 last 1573128716 [298878.325129] Lustre: fir-MDT0002: Connection restored to 98a36b0f-e51d-2359-ff75-e229118281fa (at 10.9.103.23@o2ib4) [299670.760776] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.6@o2ib4) [301991.641506] Lustre: fir-MDT0002: Connection restored to 496d5991-97fa-bf72-73bd-f159bb6d190f (at 10.9.102.11@o2ib4) [302008.049412] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.34@o2ib4) [302429.002571] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [302435.296459] Lustre: fir-MDT0002: haven't heard from client d7a6284c-7cc7-450a-e364-1580617fe154 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a771b968800, cur 1573134802 expire 1573134652 last 1573134575 [302749.651408] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.27@o2ib4) [303832.459407] Lustre: fir-MDT0002: Connection restored to dbff513f-01d4-cbaf-a3cc-37ccf6c9e544 (at 10.9.103.5@o2ib4) [304118.716331] Lustre: fir-MDT0002: Connection restored to af36ff0f-f087-f3b7-da57-859174346889 (at 10.9.103.9@o2ib4) [304253.344017] Lustre: fir-MDT0002: haven't heard from client 9173f863-dd9e-bf41-44e4-ef2a9caf85e3 (at 10.9.102.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a618b59ac00, cur 1573136620 expire 1573136470 last 1573136393 [305949.387668] Lustre: fir-MDT0002: haven't heard from client 71ddb392-3190-c4dc-8641-8394d6133acc (at 10.9.104.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90f9abcc00, cur 1573138316 expire 1573138166 last 1573138089 [306071.299363] Lustre: fir-MDT0002: Connection restored to 9173f863-dd9e-bf41-44e4-ef2a9caf85e3 (at 10.9.102.12@o2ib4) [306824.880572] Lustre: fir-MDT0002: Connection restored to 72f0cb85-4721-a907-7f48-cbe6eb1fa59e (at 10.9.104.65@o2ib4) [307081.833279] Lustre: fir-MDT0002: Connection restored to ba15a10b-a95d-133b-f873-39f741c8accb (at 10.9.115.11@o2ib4) [307745.499457] Lustre: fir-MDT0002: Connection restored to 5ebbffd8-95c2-3ef5-84d0-408c87dbc1da (at 10.9.104.24@o2ib4) [307766.246483] Lustre: fir-MDT0002: Connection restored to 71ddb392-3190-c4dc-8641-8394d6133acc (at 10.9.104.23@o2ib4) [307808.896455] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.22@o2ib4) [308620.455695] Lustre: fir-MDT0002: haven't heard from client 37ea3598-091c-a321-ed29-19c3d450a295 (at 10.9.110.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b8f2b400, cur 1573140987 expire 1573140837 last 1573140760 [308620.477663] Lustre: Skipped 1 previous similar message [308922.385553] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.72@o2ib4) [310094.023261] Lustre: fir-MDT0002: Connection restored to 37ea3598-091c-a321-ed29-19c3d450a295 (at 10.9.110.23@o2ib4) [311543.563613] Lustre: fir-MDT0002: Connection restored to ba15a10b-a95d-133b-f873-39f741c8accb (at 10.9.115.11@o2ib4) [312344.174702] Lustre: fir-MDT0002: Connection restored to 9fbd6da3-5b17-f25f-a6c9-c0b28e27f769 (at 10.9.103.20@o2ib4) [312714.504949] Lustre: fir-MDT0002: Connection restored to fe81d83c-7934-4c2d-61e5-4830373aee38 (at 10.9.104.21@o2ib4) [316046.987650] Lustre: fir-MDT0002: Connection restored to 27e653ad-83dd-4171-c80b-4ffe80f27b96 (at 10.9.102.7@o2ib4) [316314.335865] Lustre: fir-MDT0002: Connection restored to fc85a6cc-3249-1d3e-9a39-9bb09055d536 (at 10.9.105.33@o2ib4) [316357.654836] Lustre: fir-MDT0002: haven't heard from client 8d705b6b-a342-7dc2-3479-95b70cf0aed7 (at 10.9.105.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a78ffb39800, cur 1573148724 expire 1573148574 last 1573148497 [316864.682799] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.22@o2ib4) [317453.416870] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.36@o2ib4) [319311.112535] Lustre: fir-MDT0002: Connection restored to c02a4ac1-d5ed-d02a-93ed-6506496c526b (at 10.9.104.60@o2ib4) [319752.021049] LNetError: 67088:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [319942.278018] Lustre: fir-MDT0002: Connection restored to 7e06b451-e64c-4031-84a5-0ce5dd979b27 (at 10.9.102.63@o2ib4) [319968.149086] Lustre: fir-MDT0002: Connection restored to 396a5f89-a531-7c93-2ecb-3464fa975ea1 (at 10.9.102.64@o2ib4) [319974.881997] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.66@o2ib4) [319974.889406] Lustre: Skipped 1 previous similar message [319980.281921] Lustre: fir-MDT0002: Connection restored to 3546b1e3-9cda-bb4f-7bfd-2ecb357d6fd9 (at 10.9.102.65@o2ib4) [319985.076231] Lustre: fir-MDT0002: Connection restored to b6b4042d-9c67-5a7d-efa8-f4b6817634c9 (at 10.9.102.61@o2ib4) [319985.086762] Lustre: Skipped 2 previous similar messages [320012.948100] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.62@o2ib4) [320046.367207] Lustre: fir-MDT0002: Connection restored to dce7737f-50d2-c6cd-dc81-97199be784c4 (at 10.9.102.67@o2ib4) [320907.118279] Lustre: fir-MDT0002: Connection restored to 87024fcd-e9de-4931-86b0-b8038d2cef0f (at 10.8.30.15@o2ib6) [321004.516670] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [321026.773874] Lustre: fir-MDT0002: haven't heard from client e72ab8ad-e34a-af8f-6f7f-eaae1e123b5a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6bae97e000, cur 1573153393 expire 1573153243 last 1573153166 [321805.304497] Lustre: fir-MDT0002: Connection restored to 1b7d0c08-44b0-a824-41bf-c5c53434672b (at 10.9.113.11@o2ib4) [322514.149619] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.30@o2ib4) [322897.967250] Lustre: fir-MDT0002: Connection restored to (at 10.9.114.7@o2ib4) [325877.936496] Lustre: fir-MDT0002: Connection restored to 96656a1b-fbbb-d05d-d09f-af7451910c2f (at 10.9.102.25@o2ib4) [326179.450313] Lustre: fir-MDT0002: Connection restored to 6bc0eaa0-33d4-860c-2fc1-da80757897fd (at 10.8.9.2@o2ib6) [332556.430555] Lustre: fir-MDT0002: Connection restored to 9ffe0fe6-ef48-4274-f15d-45214f0f248b (at 10.9.104.66@o2ib4) [332600.679771] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.28@o2ib4) [333888.848235] Lustre: fir-MDT0002: Connection restored to 6bc0eaa0-33d4-860c-2fc1-da80757897fd (at 10.8.9.2@o2ib6) [334723.125963] Lustre: fir-MDT0002: haven't heard from client c5f752ad-463e-c0fc-bbb6-5b29206ddbd4 (at 10.9.109.53@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b8d0c400, cur 1573167089 expire 1573166939 last 1573166862 [335696.877757] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.11@o2ib4) [336119.657348] Lustre: fir-MDT0002: Connection restored to c5f752ad-463e-c0fc-bbb6-5b29206ddbd4 (at 10.9.109.53@o2ib4) [339565.072501] Lustre: fir-MDT0002: Connection restored to 6629cdea-3e58-e4a4-7a12-64efc3dfd807 (at 10.9.109.56@o2ib4) [341658.168626] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [341685.307103] Lustre: fir-MDT0002: haven't heard from client b8476795-bd9a-24f4-71cd-8f0f4415aacf (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a58ad062400, cur 1573174051 expire 1573173901 last 1573173824 [350384.562446] Lustre: fir-MDT0002: haven't heard from client 3267df8e-4520-8b87-bad4-c836649229ba (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a53afc71800, cur 1573182750 expire 1573182600 last 1573182523 [350412.869647] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [350719.791023] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [350779.550294] Lustre: fir-MDT0002: haven't heard from client 599b1614-11c2-9c4f-12a2-4812f7a0ed77 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6ec8bb7000, cur 1573183145 expire 1573182995 last 1573182918 [351725.661115] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [351772.577057] Lustre: fir-MDT0002: haven't heard from client b5631fa5-1e4c-a812-c72e-092f74603595 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a638460bc00, cur 1573184138 expire 1573183988 last 1573183911 [351775.967412] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [352029.967463] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.11@o2ib4) [356259.692374] Lustre: fir-MDT0002: haven't heard from client 84fe4322-0bf9-6fc7-4c46-ef148bf26b79 (at 10.9.104.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8ceab73400, cur 1573188625 expire 1573188475 last 1573188398 [356259.714250] Lustre: Skipped 1 previous similar message [356913.048237] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.21@o2ib4) [358092.970037] Lustre: fir-MDT0002: Connection restored to 653243c0-2ba0-99c8-73b9-eff7ee887c72 (at 10.9.104.19@o2ib4) [358105.754883] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.20@o2ib4) [360133.793110] Lustre: fir-MDT0002: haven't heard from client 468f7b3c-1170-9b0c-3534-a8d3e646f567 (at 10.9.104.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90a3671000, cur 1573192499 expire 1573192349 last 1573192272 [360133.815007] Lustre: Skipped 1 previous similar message [361944.388260] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.17@o2ib4) [364814.648429] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.32@o2ib4) [364976.414977] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [365015.916314] Lustre: fir-MDT0002: haven't heard from client bd33e674-7ce3-e36e-79ef-cb1e64ddaf0e (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a66fe80bc00, cur 1573197381 expire 1573197231 last 1573197154 [374731.770378] Lustre: fir-MDT0002: Connection restored to 9c3c9488-58ff-c1a9-13d1-a85e72e5b832 (at 10.9.104.58@o2ib4) [375715.194454] Lustre: fir-MDT0002: haven't heard from client f7e67c6f-dd4d-0bb7-a94b-1926eb6a7ca8 (at 10.9.102.56@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bed08400, cur 1573208080 expire 1573207930 last 1573207853 [376775.997640] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.72@o2ib4) [377511.697042] Lustre: fir-MDT0002: Connection restored to f7e67c6f-dd4d-0bb7-a94b-1926eb6a7ca8 (at 10.9.102.56@o2ib4) [377525.343360] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.57@o2ib4) [377625.239127] Lustre: fir-MDT0002: haven't heard from client 771319f8-9b7f-0275-9a0c-b3c50763c7ff (at 10.9.104.67@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91abf77800, cur 1573209990 expire 1573209840 last 1573209763 [377625.261006] Lustre: Skipped 1 previous similar message [379442.808920] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.67@o2ib4) [382353.754721] Lustre: fir-MDT0002: Connection restored to 1f599749-8842-0a1f-3c65-2097372f0500 (at 10.9.104.59@o2ib4) [390629.111731] Lustre: fir-MDT0002: Connection restored to 4b44f2dc-3b6a-af34-cda4-e629852174a7 (at 10.9.104.56@o2ib4) [395769.640906] LNetError: 67090:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [395769.653521] LNetError: 67090:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 1 previous similar message [395770.679747] LNetError: 67089:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [395770.692363] LNetError: 67089:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 24 previous similar messages [395773.046813] LNetError: 67093:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [395773.059428] LNetError: 67093:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 15 previous similar messages [395776.672043] Lustre: fir-MDT0002: Client ba7d4754-732a-6c3c-9b97-094af9e08a5e (at 10.8.21.25@o2ib6) reconnecting [395776.682216] Lustre: Skipped 126 previous similar messages [395776.687731] Lustre: fir-MDT0002: Connection restored to (at 10.8.21.25@o2ib6) [395776.695066] Lustre: Skipped 1 previous similar message [395777.369662] Lustre: fir-MDT0002: Connection restored to 4e071a00-9822-fefb-0949-30f0151ede35 (at 10.8.20.35@o2ib6) [395777.380097] Lustre: Skipped 11 previous similar messages [395778.426529] Lustre: fir-MDT0002: Connection restored to (at 10.8.20.33@o2ib6) [395778.433853] Lustre: Skipped 9 previous similar messages [395780.784329] Lustre: fir-MDT0002: Connection restored to (at 10.8.21.13@o2ib6) [395780.791648] Lustre: Skipped 1 previous similar message [397482.757674] Lustre: fir-MDT0002: haven't heard from client fe459a79-a836-066c-939e-a6451e11dc82 (at 10.8.28.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a71b96cf800, cur 1573229847 expire 1573229697 last 1573229620 [398195.866242] Lustre: fir-MDT0002: Connection restored to 0241409c-53ad-fe0a-f229-1a8e96abaf04 (at 10.9.104.57@o2ib4) [399045.308487] Lustre: fir-MDT0002: Connection restored to (at 10.8.28.4@o2ib6) [400142.633553] Lustre: fir-MDT0002: Connection restored to a57e533e-d09b-82c8-6ac7-955030c91ab1 (at 10.9.104.62@o2ib4) [400156.132385] Lustre: fir-MDT0002: Connection restored to 636b5125-7f54-ee19-2fdd-a98fadef91cf (at 10.9.104.64@o2ib4) [400876.844431] Lustre: fir-MDT0002: haven't heard from client 826de676-3c6c-d973-ea7e-0cedb088f58a (at 10.8.24.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61be4c4800, cur 1573233241 expire 1573233091 last 1573233014 [402038.510777] Lustre: fir-MDT0002: Connection restored to (at 10.9.116.8@o2ib4) [402359.988150] Lustre: fir-MDT0002: Connection restored to (at 10.8.24.22@o2ib6) [402414.141401] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.5@o2ib6) [402417.037539] Lustre: fir-MDT0002: Connection restored to 267d11cd-36fd-574c-4249-46ef041b9a3f (at 10.8.30.30@o2ib6) [402432.173826] Lustre: fir-MDT0002: Connection restored to c81b9097-f5ef-df90-5d08-e87e9d576b38 (at 10.8.30.18@o2ib6) [402463.464615] Lustre: fir-MDT0002: Connection restored to (at 10.8.31.3@o2ib6) [402483.087497] Lustre: fir-MDT0002: Connection restored to (at 10.8.24.30@o2ib6) [404483.957403] Lustre: fir-MDT0002: haven't heard from client 432914ae-9a3c-72ae-e7ea-2d2bb1fbee2e (at 10.9.109.56@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7898b59400, cur 1573236848 expire 1573236698 last 1573236621 [404483.979290] Lustre: Skipped 5 previous similar messages [405189.420397] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [405220.957162] Lustre: fir-MDT0002: haven't heard from client a481f9ce-26d7-a837-6f03-d23b2f50d014 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a60d1763000, cur 1573237585 expire 1573237435 last 1573237358 [405220.978950] Lustre: Skipped 1 previous similar message [405607.859805] Lustre: fir-MDT0002: Connection restored to c3c3c828-0658-8253-2e2b-6619a84333a7 (at 10.8.29.8@o2ib6) [405760.654108] Lustre: fir-MDT0002: Connection restored to efa841f1-8cbc-8f62-0b65-4658ef1a3b8e (at 10.9.109.37@o2ib4) [405844.440831] Lustre: fir-MDT0002: Connection restored to 6629cdea-3e58-e4a4-7a12-64efc3dfd807 (at 10.9.109.56@o2ib4) [405978.828060] Lustre: fir-MDT0002: Connection restored to (at 10.8.21.9@o2ib6) [405978.835287] Lustre: Skipped 2 previous similar messages [406176.920542] Lustre: fir-MDT0002: Connection restored to (at 10.8.27.21@o2ib6) [406176.927855] Lustre: Skipped 2 previous similar messages [406321.527271] Lustre: fir-MDT0002: Connection restored to fc85a6cc-3249-1d3e-9a39-9bb09055d536 (at 10.9.105.33@o2ib4) [406420.987954] Lustre: fir-MDT0002: haven't heard from client 24fc6d75-bc53-bcb0-3dcc-7e4b3eda067a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a77d1606c00, cur 1573238785 expire 1573238635 last 1573238558 [410446.600098] Lustre: fir-MDT0002: Connection restored to 5d5443a0-a51c-2ec1-3f41-1b334ea2df44 (at 10.9.104.29@o2ib4) [410446.610620] Lustre: Skipped 1 previous similar message [417168.003424] Lustre: fir-MDT0002: Connection restored to (at 10.8.19.7@o2ib6) [417193.824543] Lustre: fir-MDT0002: Connection restored to (at 10.8.19.5@o2ib6) [417193.831770] Lustre: Skipped 2 previous similar messages [417496.282720] Lustre: fir-MDT0002: haven't heard from client 269761cd-40ee-68f1-4dd8-d006fe5de9af (at 10.8.19.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5dcd971800, cur 1573249860 expire 1573249710 last 1573249633 [418660.971704] Lustre: fir-MDT0002: Connection restored to (at 10.8.19.7@o2ib6) [418688.258971] Lustre: fir-MDT0002: Connection restored to (at 10.8.19.5@o2ib6) [418698.114811] Lustre: fir-MDT0002: Connection restored to 261451ed-7be0-0399-71ca-f026e4d8d247 (at 10.8.19.1@o2ib6) [418710.392585] Lustre: fir-MDT0002: Connection restored to 35232254-7a61-77af-f0c7-aacf16976cf2 (at 10.8.19.3@o2ib6) [419845.823776] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.70@o2ib4) [425142.474039] Lustre: fir-MDT0002: haven't heard from client 5b414e12-55a1-fe9e-1068-444d4b6c6c34 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a65cfb19c00, cur 1573257506 expire 1573257356 last 1573257279 [425142.495744] Lustre: Skipped 3 previous similar messages [425203.733729] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [429471.579336] Lustre: fir-MDT0002: haven't heard from client 69caf5ed-e70e-b3b8-8b0a-2faf98296cc4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a640b0d4800, cur 1573261835 expire 1573261685 last 1573261608 [429526.659696] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [429752.282922] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [429818.588376] Lustre: fir-MDT0002: haven't heard from client 089a38fa-dd70-ab4d-6c51-b6a560ad0cef (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a646a741800, cur 1573262182 expire 1573262032 last 1573261955 [429934.875408] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [429978.595488] Lustre: fir-MDT0002: haven't heard from client 6d90523c-c4a5-b222-c543-21c8aa91d3c1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a731861dc00, cur 1573262342 expire 1573262192 last 1573262115 [430014.905948] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [430974.077132] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [431004.618573] Lustre: fir-MDT0002: haven't heard from client 215c8d52-d194-3d41-134b-493e34d0e7fb (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6b3b9f7800, cur 1573263368 expire 1573263218 last 1573263141 [431004.640280] Lustre: Skipped 1 previous similar message [431932.805970] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [431993.645910] Lustre: fir-MDT0002: haven't heard from client b4a09ef5-78d9-2290-45c0-6026c1169f11 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7bd4747800, cur 1573264357 expire 1573264207 last 1573264130 [432127.746679] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [432159.651353] Lustre: fir-MDT0002: haven't heard from client c16cdc7a-94da-d37e-89da-a4ddcfaf75f6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a634e755000, cur 1573264523 expire 1573264373 last 1573264296 [432583.665556] Lustre: fir-MDT0002: haven't heard from client facb58f6-d35d-e7cb-2ac0-ff1e59836c54 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5fa7350800, cur 1573264947 expire 1573264797 last 1573264720 [432728.137157] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [433834.694975] Lustre: fir-MDT0002: haven't heard from client 2c2907d8-826e-847f-292c-27530dc08031 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6e27847c00, cur 1573266198 expire 1573266048 last 1573265971 [433847.239519] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [434636.784929] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [434677.717971] Lustre: fir-MDT0002: haven't heard from client 70e39440-8f82-4a8d-5898-e70e5b221687 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6715f3cc00, cur 1573267041 expire 1573266891 last 1573266814 [434816.627742] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [434863.721326] Lustre: fir-MDT0002: haven't heard from client 20c3cc6e-1334-4360-a9e5-191e4ce55df5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a68850cf400, cur 1573267227 expire 1573267077 last 1573267000 [435872.763723] Lustre: fir-MDT0002: haven't heard from client c6f97bdc-0fb0-208f-46fd-db98edd7ae56 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a792c2c6800, cur 1573268236 expire 1573268086 last 1573268009 [436013.942199] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [439539.445692] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [439575.844767] Lustre: fir-MDT0002: haven't heard from client 4ca74394-bade-feb3-e741-43c94e52e19f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5c8eeed000, cur 1573271939 expire 1573271789 last 1573271712 [444976.064867] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [445022.984577] Lustre: fir-MDT0002: haven't heard from client 9dd87976-1010-0fcb-f8a9-c9d3879413a0 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5642574400, cur 1573277386 expire 1573277236 last 1573277159 [445165.243137] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [445202.997144] Lustre: fir-MDT0002: haven't heard from client a83b3b25-cc53-9ef9-cc08-3c5150e43d7c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5f5e84d800, cur 1573277566 expire 1573277416 last 1573277339 [490989.558768] Lustre: fir-MDT0002: Connection restored to 070d1bb3-881e-ed24-e666-f2c8f957d216 (at 10.9.107.2@o2ib4) [493667.837937] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [493706.246315] Lustre: fir-MDT0002: haven't heard from client 040b0130-7124-b70b-4919-4f3fb885c625 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f6e45d800, cur 1573326068 expire 1573325918 last 1573325841 [496147.432808] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.43@o2ib4) [496154.581636] Lustre: fir-MDT0002: Connection restored to 731139c2-1b80-91bb-a5ce-222ef2ee9e62 (at 10.9.104.42@o2ib4) [496178.162930] Lustre: fir-MDT0002: Connection restored to 80c7cad9-bc5d-d26e-4db9-df97d9f13930 (at 10.9.104.39@o2ib4) [502081.748322] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [502113.445086] Lustre: fir-MDT0002: haven't heard from client dce97d30-7cb6-b285-b8a1-47afb716f313 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6d47a8dc00, cur 1573334475 expire 1573334325 last 1573334248 [507123.657699] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.41@o2ib4) [507542.016389] Lustre: fir-MDT0002: Connection restored to 1cd3427d-07b9-c717-b164-e56e08e3797d (at 10.9.104.41@o2ib4) [507547.377168] Lustre: fir-MDT0002: Connection restored to 5ab78d57-f2a2-6bde-6a71-52927141dfd7 (at 10.9.104.46@o2ib4) [520409.120234] Lustre: fir-MDT0002: Connection restored to 8b633e26-6674-1118-73a2-cadebb615cfb (at 10.8.23.14@o2ib6) [520409.130671] Lustre: Skipped 1 previous similar message [520448.940770] Lustre: fir-MDT0002: haven't heard from client 94881ec3-234c-73bd-9a4c-5b1f1d67a62c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7b103a9c00, cur 1573352810 expire 1573352660 last 1573352583 [524111.684747] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.54@o2ib4) [537269.362655] Lustre: fir-MDT0002: haven't heard from client ff3255f0-5d27-da5f-9645-cda7063179e3 (at 10.8.24.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b47dc000, cur 1573369630 expire 1573369480 last 1573369403 [537943.171489] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.8@o2ib4) [552161.479930] Lustre: fir-MDT0002: Connection restored to d121ebe8-5d01-8438-9301-326f537c0516 (at 10.9.117.24@o2ib4) [552176.748644] Lustre: fir-MDT0002: haven't heard from client 6aea4482-ee18-1a1f-e6d7-08161ad333db (at 10.9.117.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a77b2e80400, cur 1573384537 expire 1573384387 last 1573384310 [552757.761589] Lustre: fir-MDT0002: haven't heard from client 848e3626-e186-041e-186e-860108c092de (at 10.9.117.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a891dee3800, cur 1573385118 expire 1573384968 last 1573384891 [552765.638875] Lustre: fir-MDT0002: Connection restored to ec5357f4-3a41-e113-e586-2392fb551089 (at 10.9.117.23@o2ib4) [556894.233235] Lustre: fir-MDT0002: Connection restored to (at 10.9.117.26@o2ib4) [556910.867164] Lustre: fir-MDT0002: haven't heard from client 6e8504a4-2b98-4a45-eba2-5f340ac8a193 (at 10.9.117.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7549fee800, cur 1573389271 expire 1573389121 last 1573389044 [557958.896338] Lustre: fir-MDT0002: haven't heard from client fa666365-87ba-9da3-cb4a-247b8a342229 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a73039a1400, cur 1573390319 expire 1573390169 last 1573390092 [558088.713153] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.54@o2ib4) [562859.019182] Lustre: fir-MDT0002: haven't heard from client 49cc9217-178e-a367-78f6-8de0b53f39ba (at 10.9.109.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bed0d400, cur 1573395219 expire 1573395069 last 1573394992 [562911.894193] Lustre: fir-MDT0002: Connection restored to 49cc9217-178e-a367-78f6-8de0b53f39ba (at 10.9.109.27@o2ib4) [563890.356314] Lustre: fir-MDT0002: Connection restored to ba15a10b-a95d-133b-f873-39f741c8accb (at 10.9.115.11@o2ib4) [563919.045473] Lustre: fir-MDT0002: haven't heard from client f59b36e4-be18-7032-d46b-c4c65b244eaa (at 10.9.115.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8e28fea400, cur 1573396279 expire 1573396129 last 1573396052 [565970.247368] Lustre: fir-MDT0002: Connection restored to (at 10.9.114.7@o2ib4) [565987.100005] Lustre: fir-MDT0002: haven't heard from client 082da037-7f94-11f0-0c3a-2bc195eec1a9 (at 10.9.114.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7b0f90ec00, cur 1573398347 expire 1573398197 last 1573398120 [566020.402259] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.114.7@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [566020.419628] LustreError: Skipped 52 previous similar messages [566120.756936] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.114.7@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [567246.133111] Lustre: fir-MDT0002: haven't heard from client a77feb89-5770-353c-ef78-3eabe68eb0de (at 10.9.107.29@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7c54728800, cur 1573399606 expire 1573399456 last 1573399379 [567950.152908] Lustre: fir-MDT0002: haven't heard from client fdbec848-75a2-11a9-0f77-113c6568605e (at 10.9.107.32@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61aa95f000, cur 1573400310 expire 1573400160 last 1573400083 [569397.953363] Lustre: fir-MDT0002: Connection restored to ba15a10b-a95d-133b-f873-39f741c8accb (at 10.9.115.11@o2ib4) [569401.188443] Lustre: fir-MDT0002: haven't heard from client 751950a2-9cb0-72f8-8cca-e39ea00b70af (at 10.9.115.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a74387f0c00, cur 1573401761 expire 1573401611 last 1573401534 [569477.190076] Lustre: fir-MDT0002: haven't heard from client cc613430-d823-445f-74cf-8b25190f3cd2 (at 10.9.116.4@o2ib4) in 219 seconds. I think it's dead, and I am evicting it. exp ffff9a66126ec800, cur 1573401837 expire 1573401687 last 1573401618 [569713.246352] Lustre: fir-MDT0002: Connection restored to (at 10.9.116.4@o2ib4) [569763.402926] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.116.4@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [572380.178714] Lustre: fir-MDT0002: Connection restored to (at 10.9.114.7@o2ib4) [572411.267384] Lustre: fir-MDT0002: haven't heard from client f92edafe-8048-ef94-bd2b-ceb7e9ae856d (at 10.9.114.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90e4b80400, cur 1573404771 expire 1573404621 last 1573404544 [573524.699899] Lustre: fir-MDT0002: Connection restored to c97c5347-abcf-7107-76a0-2ac126169b62 (at 10.9.104.33@o2ib4) [573574.795202] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.104.33@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [573582.297911] Lustre: fir-MDT0002: haven't heard from client 0122217a-243d-d3ef-07de-9885a9b82c8d (at 10.9.104.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a534d6c9c00, cur 1573405942 expire 1573405792 last 1573405715 [573675.149742] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.104.33@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [574216.313730] Lustre: fir-MDT0002: haven't heard from client 7b9bac1a-99e6-679a-0a2f-87440050237e (at 10.8.28.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a619baf8400, cur 1573406576 expire 1573406426 last 1573406349 [574216.335518] Lustre: Skipped 1 previous similar message [574271.596574] Lustre: fir-MDT0002: Connection restored to be5f4dd4-c6e6-8e2c-a807-eb173a22b459 (at 10.9.104.35@o2ib4) [574272.931260] Lustre: fir-MDT0002: Connection restored to eae7c8f6-dc9e-7749-c682-cf3cd37373ad (at 10.9.103.46@o2ib4) [574292.315442] Lustre: fir-MDT0002: haven't heard from client 700c51e8-8677-3084-d60e-127de20855b5 (at 10.9.104.35@o2ib4) in 209 seconds. I think it's dead, and I am evicting it. exp ffff9a714a8ffc00, cur 1573406652 expire 1573406502 last 1573406443 [574292.337317] Lustre: Skipped 1 previous similar message [574323.029727] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.103.46@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [574368.317391] Lustre: fir-MDT0002: haven't heard from client 9b0243f0-08e0-26e5-6091-96ad153ef3a0 (at 10.9.106.37@o2ib4) in 205 seconds. I think it's dead, and I am evicting it. exp ffff9a91a7fcb400, cur 1573406728 expire 1573406578 last 1573406523 [574370.929347] Lustre: fir-MDT0002: Connection restored to 9b0243f0-08e0-26e5-6091-96ad153ef3a0 (at 10.9.106.37@o2ib4) [574423.385451] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.103.46@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [574444.320492] Lustre: fir-MDT0002: haven't heard from client 08935864-5013-f2b6-04dd-b97f96f8361f (at 10.9.109.20@o2ib4) in 210 seconds. I think it's dead, and I am evicting it. exp ffff9a90a3673000, cur 1573406804 expire 1573406654 last 1573406594 [574523.741286] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.103.46@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [574945.969291] Lustre: fir-MDT0002: Connection restored to 712d7479-4f13-d033-d183-215e0c4661ac (at 10.9.105.52@o2ib4) [574994.345401] Lustre: fir-MDT0002: haven't heard from client 712d7479-4f13-d033-d183-215e0c4661ac (at 10.9.105.52@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a919d653000, cur 1573407354 expire 1573407204 last 1573407127 [575458.507266] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.45@o2ib4) [575508.614299] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.104.45@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [575532.349272] Lustre: fir-MDT0002: haven't heard from client 4b995233-ea3f-572f-2a20-96a209c8cf92 (at 10.9.104.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7aeafcd800, cur 1573407892 expire 1573407742 last 1573407665 [576022.106885] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.47@o2ib4) [576036.382890] Lustre: fir-MDT0002: haven't heard from client 61a09df0-267a-de8a-c7f1-c1900de89298 (at 10.9.103.47@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5400518c00, cur 1573408396 expire 1573408246 last 1573408169 [576036.404774] Lustre: Skipped 1 previous similar message [576112.439548] Lustre: fir-MDT0002: haven't heard from client 2d82d3e6-4c1e-f574-b999-31201d63aca1 (at 10.9.109.40@o2ib4) in 159 seconds. I think it's dead, and I am evicting it. exp ffff9a618b59fc00, cur 1573408472 expire 1573408322 last 1573408313 [576807.520374] Lustre: fir-MDT0002: Connection restored to c16058f5-73a7-a995-ef28-715a84706bc6 (at 10.9.104.55@o2ib4) [576857.510744] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.104.55@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [576901.389321] Lustre: fir-MDT0002: haven't heard from client ddc81b4b-751d-a756-706b-e95e8228ec1d (at 10.9.106.38@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91ba2b1400, cur 1573409261 expire 1573409111 last 1573409034 [576923.808982] Lustre: fir-MDT0002: Connection restored to ddc81b4b-751d-a756-706b-e95e8228ec1d (at 10.9.106.38@o2ib4) [576957.865283] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.104.55@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [576973.918026] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.106.38@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [577008.482560] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.57@o2ib4) [577058.219842] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.104.55@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [577074.273451] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.106.38@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [577074.290940] LustreError: Skipped 1 previous similar message [577079.387913] Lustre: fir-MDT0002: haven't heard from client a8270da0-0393-37e4-ff36-7fec1cb9e404 (at 10.9.101.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a911c210c00, cur 1573409439 expire 1573409289 last 1573409212 [577174.446947] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.104.55@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [577553.659729] Lustre: fir-MDT0002: Connection restored to 4923295e-2d94-d746-5665-538467be8b20 (at 10.9.102.4@o2ib4) [577590.470162] Lustre: fir-MDT0002: haven't heard from client 9eaf1e60-c3ed-bbaa-03d8-ebf03754c6fa (at 10.9.102.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6151ab9000, cur 1573409950 expire 1573409800 last 1573409723 [577779.404513] Lustre: fir-MDT0002: haven't heard from client e0edfcab-36c4-4cd8-8215-4dd0ded664f0 (at 10.9.109.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90f9abe800, cur 1573410139 expire 1573409989 last 1573409912 [577804.513928] Lustre: fir-MDT0002: Connection restored to (at 10.9.110.34@o2ib4) [577859.122505] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.109.22@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [579035.481574] Lustre: Failing over fir-MDT0002 [579035.511311] Lustre: fir-MDT0002: Not available for connect from 10.9.103.17@o2ib4 (stopping) [579035.519844] Lustre: Skipped 1 previous similar message [579035.636720] LustreError: 11-0: fir-MDT0003-osp-MDT0002: operation mds_disconnect to node 10.0.10.54@o2ib7 failed: rc = -107 [579035.647932] LustreError: Skipped 1 previous similar message [579036.023326] Lustre: fir-MDT0002: Not available for connect from 10.8.26.31@o2ib6 (stopping) [579036.031775] Lustre: Skipped 49 previous similar messages [579037.028681] Lustre: fir-MDT0002: Not available for connect from 10.9.110.10@o2ib4 (stopping) [579037.037215] Lustre: Skipped 79 previous similar messages [579038.673662] LustreError: 94584:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.9.109.30@o2ib4 arrived at 1573411398 with bad export cookie 8388758008466618538 [579039.183077] Lustre: fir-MDT0002: Not available for connect from 10.9.105.25@o2ib4 (stopping) [579039.191601] Lustre: Skipped 132 previous similar messages [579039.778820] LustreError: 74348:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.29.2@o2ib6 arrived at 1573411399 with bad export cookie 8388758008466624236 [579042.668803] LustreError: 94584:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.9.0.81@o2ib4 arrived at 1573411402 with bad export cookie 8388758008466624369 [579043.183384] Lustre: fir-MDT0002: Not available for connect from 10.8.8.24@o2ib6 (stopping) [579043.191752] Lustre: Skipped 281 previous similar messages [579046.579490] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.9.101.7@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [579047.133060] Lustre: server umount fir-MDT0002 complete [579093.566854] LNetError: 112317:0:(o2iblnd_cb.c:2495:kiblnd_passive_connect()) Can't accept conn from 10.0.10.212@o2ib7 on NA (ib0:1:10.0.10.53): bad dst nid 10.0.10.53@o2ib7 [579094.067846] LNetError: 112317:0:(o2iblnd_cb.c:2495:kiblnd_passive_connect()) Can't accept conn from 10.0.10.211@o2ib7 on NA (ib0:1:10.0.10.53): bad dst nid 10.0.10.53@o2ib7 [579094.083305] LNetError: 112317:0:(o2iblnd_cb.c:2495:kiblnd_passive_connect()) Skipped 3 previous similar messages [579095.092747] LNet: Removed LNI 10.0.10.53@o2ib7 [579391.646556] LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 [579391.654215] alg: No test for adler32 (adler32-zlib) [579392.481365] Lustre: Lustre: Build Version: 2.12.3_2_gb033996 [579392.614607] LNet: 59222:0:(config.c:1627:lnet_inet_enumerate()) lnet: Ignoring interface em2: it's down [579392.624540] LNet: Using FastReg for registration [579392.641380] LNet: Added LNI 10.0.10.53@o2ib7 [8/256/0/180] [579394.934290] LDISKFS-fs (dm-0): file extents enabled, maximum tree depth=5 [579395.025310] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [579395.618284] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.112.4@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [579395.635670] LustreError: Skipped 2 previous similar messages [579396.159327] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.108.16@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [579396.176779] LustreError: Skipped 18 previous similar messages [579397.230163] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.105.41@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [579397.247645] LustreError: Skipped 17 previous similar messages [579399.274978] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.102.70@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [579399.292438] LustreError: Skipped 69 previous similar messages [579403.363206] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.0.62@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [579403.380493] LustreError: Skipped 118 previous similar messages [579411.756880] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.110.34@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [579411.774328] LustreError: Skipped 167 previous similar messages [579412.647628] LNetError: 59283:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.54@o2ib7 added to recovery queue. Health = 900 [579429.707668] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.23.36@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [579429.725050] LustreError: Skipped 108 previous similar messages [579454.678966] Lustre: fir-MDT0002: Imperative Recovery not enabled, recovery window 300-900 [579454.894927] Lustre: fir-MDD0002: changelog on [579454.914038] Lustre: fir-MDT0002: in recovery but waiting for the first client to connect [579454.938114] Lustre: fir-MDT0002: Will be in recovery for at least 5:00, or until 1283 clients reconnect [579455.947027] Lustre: fir-MDT0002: Connection restored to (at 10.8.22.19@o2ib6) [579455.954343] Lustre: Skipped 35 previous similar messages [579456.543205] Lustre: fir-MDT0002: Connection restored to (at 10.9.107.52@o2ib4) [579456.550609] Lustre: Skipped 18 previous similar messages [579457.549559] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.38@o2ib4) [579457.556965] Lustre: Skipped 29 previous similar messages [579459.548864] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.58@o2ib4) [579459.556261] Lustre: Skipped 513 previous similar messages [579461.718943] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.112.6@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [579461.736315] LustreError: Skipped 1258 previous similar messages [579463.602971] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.116@o2ib7) [579463.610379] Lustre: Skipped 712 previous similar messages [579474.158036] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.101@o2ib7) [579474.165460] Lustre: Skipped 15 previous similar messages [579479.780098] Lustre: fir-MDT0002: Denying connection for new client 97102c2b-e0e2-553a-c933-88dc912145da (at 10.9.115.11@o2ib4), waiting for 1283 known clients (1276 recovered, 4 in progress, and 0 evicted) to recover in 4:35 [579504.868575] Lustre: fir-MDT0002: Denying connection for new client 97102c2b-e0e2-553a-c933-88dc912145da (at 10.9.115.11@o2ib4), waiting for 1283 known clients (1277 recovered, 4 in progress, and 0 evicted) to recover in 4:10 [579529.957208] Lustre: fir-MDT0002: Denying connection for new client 97102c2b-e0e2-553a-c933-88dc912145da (at 10.9.115.11@o2ib4), waiting for 1283 known clients (1277 recovered, 4 in progress, and 0 evicted) to recover in 3:44 [579534.824840] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.51@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [579534.842213] LustreError: Skipped 116 previous similar messages [579555.045907] Lustre: fir-MDT0002: Denying connection for new client 97102c2b-e0e2-553a-c933-88dc912145da (at 10.9.115.11@o2ib4), waiting for 1283 known clients (1277 recovered, 4 in progress, and 0 evicted) to recover in 3:19 [579557.129168] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.54@o2ib7) [579557.136482] Lustre: Skipped 37 previous similar messages [579580.134438] Lustre: fir-MDT0002: Denying connection for new client 97102c2b-e0e2-553a-c933-88dc912145da (at 10.9.115.11@o2ib4), waiting for 1283 known clients (1278 recovered, 4 in progress, and 0 evicted) to recover in 2:54 [579605.223128] Lustre: fir-MDT0002: Denying connection for new client 97102c2b-e0e2-553a-c933-88dc912145da (at 10.9.115.11@o2ib4), waiting for 1283 known clients (1278 recovered, 4 in progress, and 0 evicted) to recover in 2:29 [579630.311646] Lustre: fir-MDT0002: Denying connection for new client 97102c2b-e0e2-553a-c933-88dc912145da (at 10.9.115.11@o2ib4), waiting for 1283 known clients (1278 recovered, 4 in progress, and 0 evicted) to recover in 2:04 [579680.489098] Lustre: fir-MDT0002: Denying connection for new client 97102c2b-e0e2-553a-c933-88dc912145da (at 10.9.115.11@o2ib4), waiting for 1283 known clients (1278 recovered, 4 in progress, and 0 evicted) to recover in 1:14 [579680.509098] Lustre: Skipped 1 previous similar message [579753.184268] Lustre: fir-MDT0001-osp-MDT0002: Connection restored to 10.0.10.52@o2ib7 (at 10.0.10.52@o2ib7) [579754.945616] Lustre: fir-MDT0002: recovery is timed out, evict stale exports [579754.952989] Lustre: fir-MDT0002: disconnecting 1 stale clients [579754.990699] Lustre: fir-MDT0002: Recovery over after 5:00, of 1283 clients 1282 recovered and 1 was evicted. [579755.051490] LustreError: 11-0: fir-MDT0000-lwp-MDT0002: operation quota_acquire to node 10.0.10.51@o2ib7 failed: rc = -11 [579755.062541] LustreError: Skipped 4 previous similar messages [602266.190004] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.15@o2ib4) [602266.197403] Lustre: Skipped 3 previous similar messages [608732.272754] Lustre: fir-MDT0002: Connection restored to ce822970-8477-81bb-c010-01bdec5282c4 (at 10.9.104.63@o2ib4) [613046.208985] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.52@o2ib4) [614281.344106] Lustre: fir-MDT0002: haven't heard from client 8a8ce513-ded5-0297-ba5a-79430421de6c (at 10.9.104.51@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91000ee400, cur 1573446640 expire 1573446490 last 1573446413 [616076.151147] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.51@o2ib4) [618443.451801] Lustre: fir-MDT0002: haven't heard from client 76699ae9-25ef-53f4-8d8e-0a3d927f805c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a9128a9f000, cur 1573450802 expire 1573450652 last 1573450575 [618646.224683] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [627715.287072] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.53@o2ib4) [633520.448248] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.50@o2ib4) [658080.471698] Lustre: fir-MDT0002: haven't heard from client b7b01ae9-b47f-28f8-6310-169cab8abc4d (at 10.9.101.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bedf0800, cur 1573490438 expire 1573490288 last 1573490211 [659402.462901] Lustre: fir-MDT0002: Connection restored to 837641b0-d89a-c20b-3139-4eb8fe8d733b (at 10.9.110.31@o2ib4) [659419.593926] Lustre: fir-MDT0002: Connection restored to b366b38e-0989-244a-d0be-ed9e847a9560 (at 10.9.107.29@o2ib4) [659442.142940] Lustre: fir-MDT0002: Connection restored to 974500ad-ebc3-d8e1-3490-e9ea5a76fa92 (at 10.9.107.32@o2ib4) [659448.049977] Lustre: fir-MDT0002: Connection restored to ba6d6b80-84f0-aa6a-2ba3-b0d9fb94a304 (at 10.9.109.40@o2ib4) [659485.237171] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.27@o2ib4) [659529.426176] Lustre: fir-MDT0002: Connection restored to f93816ac-828b-55dd-4d69-fb089c1ad92a (at 10.9.109.20@o2ib4) [659529.436703] Lustre: Skipped 1 previous similar message [659619.675215] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [659732.332464] Lustre: fir-MDT0002: Connection restored to cc645112-3584-d084-5d6b-c64af0bf19ce (at 10.8.28.10@o2ib6) [659732.342904] Lustre: Skipped 1 previous similar message [659872.636796] Lustre: fir-MDT0002: Connection restored to (at 10.9.105.52@o2ib4) [659872.644203] Lustre: Skipped 1 previous similar message [659998.520501] Lustre: fir-MDT0002: haven't heard from client f61481ca-6812-a349-26e9-111c2a27f2d5 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d95bdf400, cur 1573492356 expire 1573492206 last 1573492129 [659998.542294] Lustre: Skipped 4 previous similar messages [660017.281440] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [660017.288758] Lustre: Skipped 1 previous similar message [660718.038698] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [660752.539223] Lustre: fir-MDT0002: haven't heard from client ddf8efa9-4639-9fe2-2e79-192a0ff01f74 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f0de7f800, cur 1573493110 expire 1573492960 last 1573492883 [661235.570943] Lustre: fir-MDT0002: haven't heard from client 04bac371-f9cd-bb92-9cf6-adb15a24cc80 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5a0773bc00, cur 1573493593 expire 1573493443 last 1573493366 [661418.556162] Lustre: fir-MDT0002: haven't heard from client cb2c9415-c175-4d75-a3f5-681842c71e9b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5c06134800, cur 1573493776 expire 1573493626 last 1573493549 [661757.088041] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [661757.095361] Lustre: Skipped 1 previous similar message [662549.275758] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [662561.604761] Lustre: fir-MDT0002: haven't heard from client a5f0168a-e2fc-b6e1-5a85-0215e29629ad (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f0de79400, cur 1573494919 expire 1573494769 last 1573494692 [662978.594980] Lustre: fir-MDT0002: haven't heard from client a4283ad3-2714-c412-d740-ea70d1c5d9eb (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d28a11000, cur 1573495336 expire 1573495186 last 1573495109 [663399.606873] Lustre: fir-MDT0002: haven't heard from client 93151b0c-5d46-451c-d6bc-aa3f44a06cdc (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6bcea65400, cur 1573495757 expire 1573495607 last 1573495530 [663714.086330] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [663714.093668] Lustre: Skipped 3 previous similar messages [663940.619898] Lustre: fir-MDT0002: haven't heard from client 98ba2389-b6f7-6258-9042-c4ecf144bdae (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a70dd06e000, cur 1573496298 expire 1573496148 last 1573496071 [664308.630370] Lustre: fir-MDT0002: haven't heard from client f7c0659f-69d3-5c68-10cf-3264bcf799fe (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6df59f9000, cur 1573496666 expire 1573496516 last 1573496439 [664497.811240] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [664497.818558] Lustre: Skipped 2 previous similar messages [664558.636896] Lustre: fir-MDT0002: haven't heard from client 9371a5f7-bac3-5aa3-849d-bfa2c77fe302 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6dfcc0b000, cur 1573496916 expire 1573496766 last 1573496689 [665095.649677] Lustre: fir-MDT0002: haven't heard from client 6a5f8a2a-fdf8-abe3-0ef1-d0a2d787ee06 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a71322cd800, cur 1573497453 expire 1573497303 last 1573497226 [665165.272010] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [665694.665003] Lustre: fir-MDT0002: haven't heard from client 84fcace9-b9fd-5e93-64a8-17d20240f060 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5c2d2d4c00, cur 1573498052 expire 1573497902 last 1573497825 [666116.469395] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [666116.476717] Lustre: Skipped 1 previous similar message [666140.675988] Lustre: fir-MDT0002: haven't heard from client 74c3c5dd-e7ab-95af-52e8-347a2e9ed20b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7098661800, cur 1573498498 expire 1573498348 last 1573498271 [666645.689748] Lustre: fir-MDT0002: haven't heard from client 311b728e-f4e2-87f1-fd9a-89f205cae36d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a67acb8d400, cur 1573499003 expire 1573498853 last 1573498776 [667531.470485] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [667531.477798] Lustre: Skipped 2 previous similar messages [667553.718509] Lustre: fir-MDT0002: haven't heard from client ec630e27-63c5-f028-8490-c47c257f8aa8 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a56fda24000, cur 1573499911 expire 1573499761 last 1573499684 [667818.718452] Lustre: fir-MDT0002: haven't heard from client fbbb34c3-5e5a-e4f2-a574-d56b8f3bcb7b (at 10.9.110.48@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8de8789400, cur 1573500176 expire 1573500026 last 1573499949 [668060.724326] Lustre: fir-MDT0002: haven't heard from client 82e1ab8f-428f-9322-3688-e143c2548d51 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6fe6fa1800, cur 1573500418 expire 1573500268 last 1573500191 [668177.219175] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [668594.736322] Lustre: fir-MDT0002: haven't heard from client 11708536-1e42-81d6-98d9-e7b0edb6106f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5815d36800, cur 1573500952 expire 1573500802 last 1573500725 [668633.345331] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [668964.914239] LNetError: 59266:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [668964.924497] LNetError: 59266:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (107): c: 7, oc: 0, rc: 8 [669099.299936] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [669137.752236] Lustre: fir-MDT0002: haven't heard from client cf89ea1a-1f33-2949-fef6-09fd2cd57911 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a910a145800, cur 1573501495 expire 1573501345 last 1573501268 [669691.762931] Lustre: fir-MDT0002: haven't heard from client fcac6467-1995-f66d-0145-7e76f2874eb4 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a58519fd400, cur 1573502049 expire 1573501899 last 1573501822 [670032.350593] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [670032.357904] Lustre: Skipped 3 previous similar messages [670243.494978] LNet: Service thread pid 62541 was inactive for 200.17s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [670243.511999] Pid: 62541, comm: mdt00_020 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [670243.522258] Call Trace: [670243.524815] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [670243.531847] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [670243.539042] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [670243.545793] [] osp_md_object_lock+0x162/0x2d0 [osp] [670243.552440] [] lod_object_lock+0xf3/0x7b0 [lod] [670243.558741] [] mdd_object_lock+0x3e/0xe0 [mdd] [670243.564956] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [670243.572306] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [670243.579127] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [670243.585435] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [670243.591996] [] mdt_reint_rec+0x83/0x210 [mdt] [670243.598132] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [670243.604780] [] mdt_reint+0x67/0x140 [mdt] [670243.610561] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [670243.617589] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [670243.625392] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [670243.631803] [] kthread+0xd1/0xe0 [670243.636804] [] ret_from_fork_nospec_begin+0xe/0x21 [670243.643356] [] 0xffffffffffffffff [670243.648474] LustreError: dumping log to /tmp/lustre-log.1573502600.62541 [670245.543026] LNet: Service thread pid 62828 was inactive for 200.18s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [670245.560052] Pid: 62828, comm: mdt02_041 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [670245.570314] Call Trace: [670245.572899] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [670245.579936] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [670245.587139] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [670245.593908] [] osp_md_object_lock+0x162/0x2d0 [osp] [670245.600574] [] lod_object_lock+0xf3/0x7b0 [lod] [670245.606891] [] mdd_object_lock+0x3e/0xe0 [mdd] [670245.613123] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [670245.620480] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [670245.627318] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [670245.633636] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [670245.640216] [] mdt_reint_rec+0x83/0x210 [mdt] [670245.646360] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [670245.653024] [] mdt_reint+0x67/0x140 [mdt] [670245.658823] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [670245.665867] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [670245.673680] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [670245.680104] [] kthread+0xd1/0xe0 [670245.685106] [] ret_from_fork_nospec_begin+0xe/0x21 [670245.691678] [] 0xffffffffffffffff [670245.696792] LustreError: dumping log to /tmp/lustre-log.1573502602.62828 [670252.199195] LNet: Service thread pid 62674 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [670252.216230] Pid: 62674, comm: mdt00_040 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [670252.226491] Call Trace: [670252.229080] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [670252.236113] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [670252.243305] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [670252.250065] [] osp_md_object_lock+0x162/0x2d0 [osp] [670252.256721] [] lod_object_lock+0xf3/0x7b0 [lod] [670252.263039] [] mdd_object_lock+0x3e/0xe0 [mdd] [670252.269271] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [670252.276624] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [670252.283450] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [670252.289772] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [670252.296338] [] mdt_reint_rec+0x83/0x210 [mdt] [670252.302472] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [670252.309120] [] mdt_reint+0x67/0x140 [mdt] [670252.314910] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [670252.321948] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [670252.329758] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [670252.336180] [] kthread+0xd1/0xe0 [670252.341190] [] ret_from_fork_nospec_begin+0xe/0x21 [670252.347758] [] 0xffffffffffffffff [670252.352876] LustreError: dumping log to /tmp/lustre-log.1573502609.62674 [670271.655649] LNet: Service thread pid 62807 was inactive for 200.38s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [670271.672668] Pid: 62807, comm: mdt01_067 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [670271.682929] Call Trace: [670271.685486] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [670271.692522] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [670271.699748] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [670271.706510] [] osp_md_object_lock+0x162/0x2d0 [osp] [670271.713173] [] lod_object_lock+0xf3/0x7b0 [lod] [670271.719485] [] mdd_object_lock+0x3e/0xe0 [mdd] [670271.725722] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [670271.733075] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [670271.739916] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [670271.746230] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [670271.752812] [] mdt_reint_rec+0x83/0x210 [mdt] [670271.758961] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [670271.765618] [] mdt_reint+0x67/0x140 [mdt] [670271.771426] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [670271.778458] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [670271.786281] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [670271.792694] [] kthread+0xd1/0xe0 [670271.797709] [] ret_from_fork_nospec_begin+0xe/0x21 [670271.804273] [] 0xffffffffffffffff [670271.809396] LustreError: dumping log to /tmp/lustre-log.1573502629.62807 [670343.317380] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [670343.333465] LustreError: 62541:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1573502400, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9a5eb46a8240/0xebb19283e1360ae3 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 7 type: IBT flags: 0x1000001000000 nid: local remote: 0x2bdea4f17271d522 expref: -99 pid: 62541 timeout: 0 lvb_type: 0 [670345.363424] LustreError: 62828:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1573502402, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9a78b90caac0/0xebb19283e137af30 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 7 type: IBT flags: 0x1000001000000 nid: local remote: 0x2bdea4f172851af5 expref: -99 pid: 62828 timeout: 0 lvb_type: 0 [670347.945475] LNet: Service thread pid 62465 was inactive for 200.04s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [670347.962498] Pid: 62465, comm: mdt03_014 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [670347.972754] Call Trace: [670347.975305] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [670347.982341] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [670347.989540] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [670347.996301] [] osp_md_object_lock+0x162/0x2d0 [osp] [670348.002956] [] lod_object_lock+0xf3/0x7b0 [lod] [670348.009266] [] mdd_object_lock+0x3e/0xe0 [mdd] [670348.015488] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [670348.022845] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [670348.029675] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [670348.035986] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [670348.042555] [] mdt_reint_rec+0x83/0x210 [mdt] [670348.048692] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [670348.055349] [] mdt_reint+0x67/0x140 [mdt] [670348.061137] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [670348.068176] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [670348.075976] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [670348.082390] [] kthread+0xd1/0xe0 [670348.087390] [] ret_from_fork_nospec_begin+0xe/0x21 [670348.093949] [] 0xffffffffffffffff [670348.099057] LustreError: dumping log to /tmp/lustre-log.1573502705.62465 [670351.911583] LustreError: 62674:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1573502409, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9a6fc73a2f40/0xebb19283e13a6adc lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 7 type: IBT flags: 0x1000001000000 nid: local remote: 0x2bdea4f172c140ef expref: -99 pid: 62674 timeout: 0 lvb_type: 0 [670371.277044] LustreError: 62807:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1573502428, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9a78334d9f80/0xebb19283e13fd5e6 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 7 type: IBT flags: 0x1000001000000 nid: local remote: 0x2bdea4f173579e97 expref: -99 pid: 62807 timeout: 0 lvb_type: 0 [670447.900881] LustreError: 62465:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1573502505, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9a90c1663cc0/0xebb19283e157bb14 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 8 type: IBT flags: 0x1000001000000 nid: local remote: 0x2bdea4f175f171f5 expref: -99 pid: 62465 timeout: 0 lvb_type: 0 [670526.397773] LNet: Service thread pid 62541 completed after 483.07s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [670526.414038] LNet: Skipped 3 previous similar messages [671015.336471] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [671015.343781] Lustre: Skipped 1 previous similar message [671037.796012] Lustre: fir-MDT0002: haven't heard from client ddd9a1a2-b7c9-a3a3-65a1-2d344861bae4 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a69b9f3ac00, cur 1573503395 expire 1573503245 last 1573503168 [671037.817808] Lustre: Skipped 1 previous similar message [671443.804795] Lustre: fir-MDT0002: haven't heard from client 524ad9c0-b025-3972-05ae-6cabbfa21c3f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5a5a196400, cur 1573503801 expire 1573503651 last 1573503574 [671795.990203] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [671795.997520] Lustre: Skipped 1 previous similar message [671835.815296] Lustre: fir-MDT0002: haven't heard from client 2321bc4d-d325-b19b-eb55-ad2dc0bfdedf (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8de3e9f400, cur 1573504193 expire 1573504043 last 1573503966 [672199.824003] Lustre: fir-MDT0002: haven't heard from client 6cc86182-4677-0743-ca17-0a5f3ed7c6fd (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6701f45400, cur 1573504557 expire 1573504407 last 1573504330 [672199.845796] Lustre: Skipped 1 previous similar message [672590.378653] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [672590.385974] Lustre: Skipped 1 previous similar message [673057.844622] Lustre: fir-MDT0002: haven't heard from client d6522fdf-138f-d055-2e61-0cef11247205 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7e52d38800, cur 1573505415 expire 1573505265 last 1573505188 [673057.866416] Lustre: Skipped 1 previous similar message [674392.852513] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [674392.859831] Lustre: Skipped 1 previous similar message [674416.878119] Lustre: fir-MDT0002: haven't heard from client f9735571-189c-871d-7f02-b178411b4ea0 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5c4fa5e800, cur 1573506774 expire 1573506624 last 1573506547 [674628.897716] Lustre: fir-MDT0002: haven't heard from client 5bb02c69-9e7d-b0f8-82b4-e6d50c869bbb (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a69b9f3e000, cur 1573506986 expire 1573506836 last 1573506759 [674995.371512] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [675427.220842] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [675462.905471] Lustre: fir-MDT0002: haven't heard from client a625dfbe-2367-ffc6-4754-9fe1b4f9eedf (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a64829e9000, cur 1573507820 expire 1573507670 last 1573507593 [676722.261604] Lustre: 62496:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573509072/real 1573509072] req@ffff9a8110946300 x1649841985483968/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573509079 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [676729.288770] Lustre: 62496:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573509079/real 1573509079] req@ffff9a8110946300 x1649841985483968/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573509086 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [676736.316937] Lustre: 62496:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573509086/real 1573509086] req@ffff9a8110946300 x1649841985483968/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573509093 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [676743.344105] Lustre: 62496:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573509093/real 1573509093] req@ffff9a8110946300 x1649841985483968/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573509100 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [676750.371277] Lustre: 62496:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573509100/real 1573509100] req@ffff9a8110946300 x1649841985483968/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573509107 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [676750.402095] LustreError: 62496:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) returned error from glimpse AST (req@ffff9a8110946300 x1649841985483968 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a61620669c0/0xebb19283f37f6500 lrc: 4/0,0 mode: PW/PW res: [0x2c002beea:0x48d8:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0xa924aa6555941f71 expref: 53 pid: 62553 timeout: 0 lvb_type: 0 [676750.444689] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 [676750.457240] LustreError: 59389:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 295s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a61620669c0/0xebb19283f37f6500 lrc: 3/0,0 mode: PW/PW res: [0x2c002beea:0x48d8:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0xa924aa6555941f71 expref: 54 pid: 62553 timeout: 0 lvb_type: 0 [676754.986540] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [677258.963322] Lustre: 62657:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573509609/real 1573509609] req@ffff9a6f21eb7500 x1649842009032560/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573509616 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [677279.990830] Lustre: 62657:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573509630/real 1573509630] req@ffff9a6f21eb7500 x1649842009032560/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573509637 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [677280.018179] Lustre: 62657:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [677315.030660] Lustre: 62657:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573509665/real 1573509665] req@ffff9a6f21eb7500 x1649842009032560/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573509672 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [677315.058019] Lustre: 62657:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [677378.070758] LustreError: 62657:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) returned error from blocking AST (req@ffff9a6f21eb7500 x1649842009032560 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a575a726e40/0xebb19283f6e63ec0 lrc: 4/0,0 mode: PR/PR res: [0x2c002beea:0x48d0:0x0].0x0 bits 0x20/0x0 rrc: 21 type: IBT flags: 0x60200400000020 nid: 10.8.23.14@o2ib6 remote: 0x1da7d92a855d8618 expref: 63 pid: 62674 timeout: 677509 lvb_type: 0 [677378.113983] LustreError: 62657:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message [677378.124173] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock blocking callback time out: rc -107 [677378.136934] LustreError: Skipped 1 previous similar message [677378.142625] LustreError: 59389:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 126s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a575a726e40/0xebb19283f6e63ec0 lrc: 3/0,0 mode: PR/PR res: [0x2c002beea:0x48d0:0x0].0x0 bits 0x20/0x0 rrc: 21 type: IBT flags: 0x60200400000020 nid: 10.8.23.14@o2ib6 remote: 0x1da7d92a855d8618 expref: 64 pid: 62674 timeout: 0 lvb_type: 0 [677388.832629] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [677873.768208] Lustre: fir-MDT0002: Connection restored to (at 10.8.31.6@o2ib6) [678483.973298] Lustre: fir-MDT0002: haven't heard from client b969655b-a18e-0100-9d4e-6e9e8481a824 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6669f38000, cur 1573510841 expire 1573510691 last 1573510614 [678593.549036] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [679819.011118] Lustre: fir-MDT0002: haven't heard from client 43a5bd83-c570-e111-099a-aca9976c475c (at 10.9.101.60@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a7fc8c00, cur 1573512176 expire 1573512026 last 1573511949 [680420.441717] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.61@o2ib4) [680563.022301] Lustre: fir-MDT0002: haven't heard from client 9746266f-f264-f12c-ff85-91ec754406e9 (at 10.9.102.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8ff4224000, cur 1573512920 expire 1573512770 last 1573512693 [681507.368929] Lustre: fir-MDT0002: Connection restored to (at 10.8.27.15@o2ib6) [681546.616151] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.60@o2ib4) [682422.593061] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.32@o2ib4) [682426.589550] Lustre: fir-MDT0002: Connection restored to fc3bb080-3a71-6c69-ed32-4f484b0087e1 (at 10.9.102.28@o2ib4) [682430.780529] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.29@o2ib4) [682442.959618] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.30@o2ib4) [690022.252649] Lustre: fir-MDT0002: haven't heard from client 383ec4f9-4860-c2f2-0bb5-854b4d573cfa (at 10.9.109.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a55e3d7e800, cur 1573522379 expire 1573522229 last 1573522152 [690022.274546] Lustre: Skipped 1 previous similar message [690059.072585] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.27@o2ib4) [705966.993456] Lustre: DEBUG MARKER: Mon Nov 11 21:58:43 2019 [708890.747716] Lustre: fir-MDT0002: haven't heard from client fa08a5e1-f3d5-fcf0-7657-236f6a1e3164 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6a21b11400, cur 1573541247 expire 1573541097 last 1573541020 [709066.396390] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [709207.819267] Lustre: fir-MDD0002: changelog off [710529.948050] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [710561.803744] Lustre: fir-MDT0002: haven't heard from client 7262d67d-95bc-806b-aefa-e4bfbcc62813 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a702f383000, cur 1573542918 expire 1573542768 last 1573542691 [711096.694626] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [711147.807781] Lustre: fir-MDT0002: haven't heard from client 47bf463f-b834-3dae-7d0a-66f747d6aab5 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7cd67dcc00, cur 1573543504 expire 1573543354 last 1573543277 [712135.340910] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [712169.835689] Lustre: fir-MDT0002: haven't heard from client a07a13de-7973-0681-60f1-5c7be51c3eff (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5da5df3400, cur 1573544526 expire 1573544376 last 1573544299 [714031.085990] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [714057.889583] Lustre: fir-MDT0002: haven't heard from client 882fcb14-0061-ad16-7cd0-2eb0941f2b7c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6ddcef2000, cur 1573546414 expire 1573546264 last 1573546187 [714309.890281] Lustre: fir-MDT0002: haven't heard from client 39059294-9b95-fff7-fbbb-a9a911606907 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a79802e1c00, cur 1573546666 expire 1573546516 last 1573546439 [714356.766974] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [715494.921692] Lustre: fir-MDT0002: haven't heard from client 131d47ac-360c-8b80-7978-3ab9624dffab (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91000ed400, cur 1573547851 expire 1573547701 last 1573547624 [715618.337438] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.54@o2ib4) [723405.763073] Lustre: fir-MDT0002: Connection restored to a402a874-21a6-76c0-04c7-cb9a15009f9d (at 10.9.101.42@o2ib4) [742012.611376] Lustre: fir-MDT0002: haven't heard from client e396d93f-3bc1-ad7b-8fc6-f8f980047572 (at 10.9.101.42@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a9117e33000, cur 1573574368 expire 1573574218 last 1573574141 [742241.576814] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.31@o2ib4) [743229.158273] Lustre: fir-MDT0002: Connection restored to (at 10.8.28.1@o2ib6) [743595.174073] Lustre: fir-MDT0002: Connection restored to (at 10.8.28.1@o2ib6) [743781.234368] Lustre: fir-MDT0002: Connection restored to a402a874-21a6-76c0-04c7-cb9a15009f9d (at 10.9.101.42@o2ib4) [754804.937935] Lustre: fir-MDT0002: haven't heard from client 463d3942-0c6e-8904-d983-8beb8e77923d (at 10.8.31.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91000ec000, cur 1573587160 expire 1573587010 last 1573586933 [764770.692897] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.60@o2ib4) [765262.800735] Lustre: 62619:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573597610/real 1573597610] req@ffff9a6dfcf1f980 x1649842153936064/t0(0) o104->fir-MDT0002@10.9.0.63@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573597617 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [765262.827989] Lustre: 62619:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [765276.838074] Lustre: 62619:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573597624/real 1573597624] req@ffff9a6dfcf1f980 x1649842153936064/t0(0) o104->fir-MDT0002@10.9.0.63@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573597631 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [765276.865346] Lustre: 62619:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [765289.197458] Lustre: fir-MDT0002: haven't heard from client 3efe8bf8-66ec-7ea4-2df5-b70424ee9926 (at 10.9.108.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91ba0d5000, cur 1573597644 expire 1573597494 last 1573597417 [765289.219341] Lustre: Skipped 2 previous similar messages [765297.875584] Lustre: 62619:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573597645/real 1573597645] req@ffff9a6dfcf1f980 x1649842153936064/t0(0) o104->fir-MDT0002@10.9.0.63@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573597652 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [765297.902840] Lustre: 62619:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [765332.913425] Lustre: 62619:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573597680/real 1573597680] req@ffff9a6dfcf1f980 x1649842153936064/t0(0) o104->fir-MDT0002@10.9.0.63@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573597687 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [765332.940683] Lustre: 62619:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [765360.952134] LustreError: 62619:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.0.63@o2ib4) failed to reply to blocking AST (req@ffff9a6dfcf1f980 x1649842153936064 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a86e2b2e0c0/0xebb192855d3fb76e lrc: 4/0,0 mode: PR/PR res: [0x2c0032283:0x10821:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.9.0.63@o2ib4 remote: 0xacb6c523f0c91639 expref: 2881 pid: 62607 timeout: 765433 lvb_type: 0 [765360.994995] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.0.63@o2ib4 was evicted due to a lock blocking callback time out: rc -110 [765361.007534] LustreError: 59389:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.9.0.63@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a86e2b2e0c0/0xebb192855d3fb76e lrc: 3/0,0 mode: PR/PR res: [0x2c0032283:0x10821:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.9.0.63@o2ib4 remote: 0xacb6c523f0c91639 expref: 2882 pid: 62607 timeout: 0 lvb_type: 0 [766698.862287] Lustre: fir-MDT0002: Connection restored to 0e0bd59d-647b-5d74-c34b-4512d443b11f (at 10.9.108.2@o2ib4) [767233.877423] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.63@o2ib4) [825049.731899] Lustre: fir-MDT0002: haven't heard from client d6fb748d-e0c8-499a-6bd4-a486a265bb56 (at 10.8.23.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a89d807c000, cur 1573657403 expire 1573657253 last 1573657176 [825049.753690] Lustre: Skipped 1 previous similar message [834462.560762] Lustre: fir-MDT0002: Connection restored to (at 10.8.13.29@o2ib6) [834463.230343] Lustre: fir-MDT0002: Connection restored to (at 10.8.13.28@o2ib6) [836134.011752] Lustre: fir-MDT0002: haven't heard from client 0fd1db7a-906d-6dc2-d010-665c1d8dab04 (at 10.8.13.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6b3cdb8c00, cur 1573668487 expire 1573668337 last 1573668260 [837614.816250] Lustre: fir-MDT0002: Connection restored to (at 10.8.31.10@o2ib6) [837621.968454] Lustre: fir-MDT0002: Connection restored to (at 10.8.31.9@o2ib6) [837625.129527] Lustre: fir-MDT0002: Connection restored to (at 10.8.13.28@o2ib6) [837639.393859] Lustre: fir-MDT0002: Connection restored to (at 10.8.31.8@o2ib6) [837643.435275] Lustre: fir-MDT0002: Connection restored to (at 10.8.13.29@o2ib6) [837653.213685] Lustre: fir-MDT0002: Connection restored to (at 10.8.31.7@o2ib6) [837653.220920] Lustre: Skipped 2 previous similar messages [837671.694868] Lustre: fir-MDT0002: Connection restored to (at 10.8.31.1@o2ib6) [837671.702095] Lustre: Skipped 2 previous similar messages [838918.080464] Lustre: fir-MDT0002: Connection restored to (at 10.8.19.3@o2ib6) [838918.087701] Lustre: Skipped 4 previous similar messages [839244.273958] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.12@o2ib6) [887987.388208] Lustre: fir-MDT0002: haven't heard from client dfbf3ee2-1375-c7ab-f070-2d0110b596af (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6933561c00, cur 1573720339 expire 1573720189 last 1573720112 [887987.410082] Lustre: Skipped 1 previous similar message [900991.694095] Lustre: fir-MDT0002: haven't heard from client c63740fb-976e-0cae-82b2-d790bb676c66 (at 10.9.103.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8db0838c00, cur 1573733343 expire 1573733193 last 1573733116 [915394.093509] Lustre: fir-MDT0002: haven't heard from client 0ecf568b-1011-1120-1681-ecfcd2f5ff0d (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a399d400, cur 1573747745 expire 1573747595 last 1573747518 [919644.758487] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.26@o2ib4) [920657.194391] Lustre: fir-MDT0002: haven't heard from client a3e758b6-0383-e5ba-1ad7-84d3c0633f8f (at 10.9.108.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8db083ac00, cur 1573753008 expire 1573752858 last 1573752781 [921847.415132] Lustre: fir-MDT0002: Connection restored to 0ecf568b-1011-1120-1681-ecfcd2f5ff0d (at 10.9.112.13@o2ib4) [921851.678857] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.15@o2ib4) [922017.858562] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.3@o2ib4) [922254.541620] Lustre: fir-MDT0002: Connection restored to 28907530-4293-a2f1-9e77-895a96735526 (at 10.8.20.7@o2ib6) [922299.008184] Lustre: fir-MDT0002: Connection restored to (at 10.8.30.14@o2ib6) [981475.779554] Lustre: fir-MDT0002: haven't heard from client 2e71b661-9e58-6682-d613-1dbc9ce0ed6f (at 10.9.102.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a8195800, cur 1573813825 expire 1573813675 last 1573813598 [1007134.111913] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.17@o2ib4) [1033545.119642] Lustre: fir-MDT0002: haven't heard from client 44f8b2ee-997a-76ac-e411-461a192e3e81 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8cc72bf400, cur 1573865893 expire 1573865743 last 1573865666 [1039378.833795] Lustre: fir-MDT0002: Connection restored to (at 10.8.9.2@o2ib6) [1062591.440170] Lustre: 62522:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573894931/real 1573894931] req@ffff9a6779f34380 x1649843755542368/t0(0) o104->fir-MDT0002@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573894938 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [1062591.467686] Lustre: 62522:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 10 previous similar messages [1062605.477526] Lustre: 62522:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573894945/real 1573894945] req@ffff9a6779f34380 x1649843755542368/t0(0) o104->fir-MDT0002@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573894952 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [1062605.505035] Lustre: 62522:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [1062626.515054] Lustre: 62522:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573894966/real 1573894966] req@ffff9a6779f34380 x1649843755542368/t0(0) o104->fir-MDT0002@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573894973 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [1062626.542564] Lustre: 62522:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [1062661.552938] Lustre: 62522:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573895001/real 1573895001] req@ffff9a6779f34380 x1649843755542368/t0(0) o104->fir-MDT0002@10.9.109.37@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573895008 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [1062661.580452] Lustre: 62522:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [1062689.590655] LustreError: 62522:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.109.37@o2ib4) failed to reply to blocking AST (req@ffff9a6779f34380 x1649843755542368 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a7ad9e60480/0xebb1928abe9ed1d9 lrc: 4/0,0 mode: PR/PR res: [0x2c0033d31:0x16ddb:0x0].0x0 bits 0x1b/0x0 rrc: 24 type: IBT flags: 0x60200400000020 nid: 10.9.109.37@o2ib4 remote: 0xd52f3363208534b0 expref: 965 pid: 59720 timeout: 1062754 lvb_type: 0 [1062689.634042] LustreError: 62522:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 2 previous similar messages [1062689.644305] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.109.37@o2ib4 was evicted due to a lock blocking callback time out: rc -110 [1062689.657101] LustreError: Skipped 2 previous similar messages [1062689.662966] LustreError: 59389:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.9.109.37@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a7ad9e60480/0xebb1928abe9ed1d9 lrc: 3/0,0 mode: PR/PR res: [0x2c0033d31:0x16ddb:0x0].0x0 bits 0x1b/0x0 rrc: 24 type: IBT flags: 0x60200400000020 nid: 10.9.109.37@o2ib4 remote: 0xd52f3363208534b0 expref: 966 pid: 59720 timeout: 0 lvb_type: 0 [1094301.663458] Lustre: fir-MDT0002: haven't heard from client 65477f77-7974-8332-865b-997a80dc0836 (at 10.9.109.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8bc97dcc00, cur 1573926648 expire 1573926498 last 1573926421 [1200456.355515] Lustre: fir-MDT0002: haven't heard from client 7f0150d2-2d13-3e3b-448c-926ab08c2adf (at 10.9.108.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8ace368000, cur 1574032800 expire 1574032650 last 1574032573 [1200456.377486] Lustre: Skipped 3 previous similar messages [1224540.980088] Lustre: fir-MDT0002: haven't heard from client 270b3b8a-9420-c741-af68-3f6c40e0e650 (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a902429f800, cur 1574056884 expire 1574056734 last 1574056657 [1263789.559371] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.15@o2ib4) [1264079.190903] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.19@o2ib4) [1264084.364994] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.37@o2ib4) [1264090.262176] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.20@o2ib4) [1264103.982916] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.17@o2ib4) [1264151.576376] Lustre: fir-MDT0002: Connection restored to f93816ac-828b-55dd-4d69-fb089c1ad92a (at 10.9.109.20@o2ib4) [1264151.586990] Lustre: Skipped 1 previous similar message [1264212.765214] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1288602.822586] Lustre: fir-MDT0002: Client 51390574-c509-f8c2-383b-446baae03d6d (at 10.9.0.61@o2ib4) reconnecting [1288602.832802] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.61@o2ib4) [1288652.103756] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.108.72@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [1288668.106541] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.102.69@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [1288668.124100] LustreError: Skipped 134 previous similar messages [1288702.701879] Lustre: fir-MDT0002: Client 51390574-c509-f8c2-383b-446baae03d6d (at 10.9.0.61@o2ib4) reconnecting [1288702.712083] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.61@o2ib4) [1288731.174956] Lustre: fir-MDT0002: Client 51390574-c509-f8c2-383b-446baae03d6d (at 10.9.0.61@o2ib4) reconnecting [1288731.185157] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.61@o2ib4) [1288796.400253] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.61@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [1288796.417633] LustreError: Skipped 128 previous similar messages [1288821.488824] Lustre: fir-MDT0002: Client 51390574-c509-f8c2-383b-446baae03d6d (at 10.9.0.61@o2ib4) reconnecting [1288821.499055] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.61@o2ib4) [1288841.393469] Lustre: fir-MDT0002: Client 51390574-c509-f8c2-383b-446baae03d6d (at 10.9.0.61@o2ib4) reconnecting [1288935.283544] Lustre: fir-MDT0002: Client 51390574-c509-f8c2-383b-446baae03d6d (at 10.9.0.61@o2ib4) reconnecting [1288935.293751] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.61@o2ib4) [1288935.301075] Lustre: Skipped 1 previous similar message [1290732.522386] Lustre: fir-MDT0002: Client cec884d3-ca4b-8127-2f6b-7762665aa5f8 (at 10.9.0.64@o2ib4) reconnecting [1290732.532582] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.64@o2ib4) [1290806.563791] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [1290831.659575] Lustre: fir-MDT0002: Client cec884d3-ca4b-8127-2f6b-7762665aa5f8 (at 10.9.0.64@o2ib4) reconnecting [1290831.669784] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.64@o2ib4) [1311527.206627] Lustre: fir-MDT0002: haven't heard from client 60eedf7b-2931-5ba5-38d8-f768383003aa (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a53e92e7400, cur 1574143868 expire 1574143718 last 1574143641 [1325813.574641] Lustre: fir-MDT0002: haven't heard from client d1c2bfa2-0afe-5a67-e9f1-19f32adca6e9 (at 10.9.103.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b890b800, cur 1574158154 expire 1574158004 last 1574157927 [1406897.660354] Lustre: fir-MDT0002: haven't heard from client a78da21a-1983-9c53-0c7f-bf644bbe7985 (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d061b4800, cur 1574239236 expire 1574239086 last 1574239009 [1406947.509040] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.37@o2ib4) [1413486.829645] Lustre: fir-MDT0002: haven't heard from client 08d52529-6ee9-4863-0489-8f7637c36108 (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a787319e800, cur 1574245825 expire 1574245675 last 1574245598 [1413721.394972] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.37@o2ib4) [1473848.396717] Lustre: fir-MDT0002: haven't heard from client 52903504-32b3-08ac-31d3-e5b5c8cf2ff7 (at 10.9.105.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8de878e000, cur 1574306185 expire 1574306035 last 1574305958 [1496161.803930] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1496183.976617] Lustre: fir-MDT0002: haven't heard from client ac7f7f01-f9d5-1d67-db3a-a2d866eea163 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6fc7396800, cur 1574328520 expire 1574328370 last 1574328293 [1522267.679563] Lustre: fir-MDT0002: haven't heard from client 6b0d620f-0339-f9fa-bf61-97590cdd033f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a701e668c00, cur 1574354603 expire 1574354453 last 1574354376 [1522341.462283] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1531890.904385] Lustre: fir-MDT0002: haven't heard from client 6d425bf1-ae64-458c-862a-5429e8d0ec7d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5c41971800, cur 1574364226 expire 1574364076 last 1574363999 [1532067.740940] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1532347.956713] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1532370.918875] Lustre: fir-MDT0002: haven't heard from client 9f74f2f7-345c-d393-718a-6d710c5e071f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6b16b5c400, cur 1574364706 expire 1574364556 last 1574364479 [1533453.945750] Lustre: fir-MDT0002: haven't heard from client d8f2e4b0-ea8c-ec45-cbe3-b4fa341d66fe (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6045abe000, cur 1574365789 expire 1574365639 last 1574365562 [1533824.953499] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1536135.046372] Lustre: fir-MDT0002: haven't heard from client a485129a-c5e3-046f-ce0c-07abefc364b6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a59f4f69000, cur 1574368470 expire 1574368320 last 1574368243 [1536184.036482] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1536614.993435] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1536626.036409] Lustre: fir-MDT0002: haven't heard from client ea409c8a-d961-421c-2d5e-9d7a6bd93925 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5752e29400, cur 1574368961 expire 1574368811 last 1574368734 [1539728.115860] Lustre: fir-MDT0002: haven't heard from client 76301894-a53a-5581-5cc2-7ff0eca95324 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5752e2e800, cur 1574372063 expire 1574371913 last 1574371836 [1539819.289962] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1540311.124729] Lustre: fir-MDT0002: haven't heard from client 3a9fa408-732c-4291-7f13-3ba8991dc846 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a675df8c800, cur 1574372646 expire 1574372496 last 1574372419 [1540347.758956] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1543873.110683] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1543912.218320] Lustre: fir-MDT0002: haven't heard from client 14111ea9-6dc9-c218-1f5b-d51125ae35d0 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a70c7a8a400, cur 1574376247 expire 1574376097 last 1574376020 [1546083.300876] Lustre: fir-MDT0002: haven't heard from client f41f3c06-c341-d601-349e-5a788a74160e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a815bd47400, cur 1574378418 expire 1574378268 last 1574378191 [1546164.000396] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1546580.198517] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1546581.301233] Lustre: fir-MDT0002: haven't heard from client 724d6f47-b034-f0d2-6e16-f086638c8020 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5724522400, cur 1574378916 expire 1574378766 last 1574378689 [1547891.231727] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1547925.325824] Lustre: fir-MDT0002: haven't heard from client 541efe60-3c67-f4d7-0817-d30d0311a549 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5724526400, cur 1574380260 expire 1574380110 last 1574380033 [1548144.381982] Lustre: fir-MDT0002: haven't heard from client 4698382a-9c15-fc2e-605d-16c58278088b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6dbf39f400, cur 1574380479 expire 1574380329 last 1574380252 [1548525.493991] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1549168.364891] Lustre: fir-MDT0002: haven't heard from client 26319490-7a45-9f99-8d7f-86a5a40f6af8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a718bd0dc00, cur 1574381503 expire 1574381353 last 1574381276 [1549416.246388] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1549744.373302] Lustre: fir-MDT0002: haven't heard from client 65dd2c66-edc9-dd46-19e1-f8109ec90718 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a789ff3ac00, cur 1574382079 expire 1574381929 last 1574381852 [1549962.688972] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1550717.400633] Lustre: fir-MDT0002: haven't heard from client 041ba9ed-c76e-d3f3-4f37-38177cd14815 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7b30d2e000, cur 1574383052 expire 1574382902 last 1574382825 [1550748.413699] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1552870.460532] Lustre: fir-MDT0002: haven't heard from client d6c961f6-3b1a-598a-cc6f-b13a518f75fa (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7aeafe3400, cur 1574385205 expire 1574385055 last 1574384978 [1552925.733776] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1553933.378203] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1553981.598061] Lustre: fir-MDT0002: haven't heard from client 31b801e9-4ec0-25a9-df0a-c0a070a14c2d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7f77ebfc00, cur 1574386316 expire 1574386166 last 1574386089 [1554462.573671] Lustre: fir-MDT0002: haven't heard from client 563e1ed1-fb3a-9d4d-1922-e8735133494c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a790900b400, cur 1574386797 expire 1574386647 last 1574386570 [1554566.413773] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1557704.591040] Lustre: fir-MDT0002: haven't heard from client 3af119a0-3930-9946-5e2e-703bd8fa4045 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a77be0ac800, cur 1574390039 expire 1574389889 last 1574389812 [1557877.403146] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1558089.627557] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1558130.619178] Lustre: fir-MDT0002: haven't heard from client c2c1253a-1fbe-dc8c-743f-463d30213621 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a51c8c89400, cur 1574390465 expire 1574390315 last 1574390238 [1577576.103141] Lustre: fir-MDT0002: haven't heard from client 0ef89b5b-e6b1-a00e-46d0-80c0d131b036 (at 10.9.104.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b94e7000, cur 1574409910 expire 1574409760 last 1574409683 [1577652.105169] Lustre: fir-MDT0002: haven't heard from client 13bafaa0-92e3-38c7-e7a5-5f2b85438861 (at 10.9.101.1@o2ib4) in 224 seconds. I think it's dead, and I am evicting it. exp ffff9a91b94e5c00, cur 1574409986 expire 1574409836 last 1574409762 [1577865.115689] Lustre: fir-MDT0002: haven't heard from client a3cc9eb1-2294-e732-bda9-c4307aaf5373 (at 10.9.114.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a8194400, cur 1574410199 expire 1574410049 last 1574409972 [1578508.649507] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.59@o2ib4) [1578523.158947] Lustre: fir-MDT0002: haven't heard from client fcd46f8d-0218-15f8-3b9f-b1cac1885168 (at 10.9.102.60@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90e35ea400, cur 1574410857 expire 1574410707 last 1574410630 [1579549.154679] Lustre: fir-MDT0002: haven't heard from client 2f9ad975-3981-ab11-d117-ceb37a6f6927 (at 10.9.106.72@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d39c12800, cur 1574411883 expire 1574411733 last 1574411656 [1579549.176644] Lustre: Skipped 1 previous similar message [1579861.475769] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.31@o2ib4) [1580231.224781] Lustre: fir-MDT0002: haven't heard from client 8241bde0-bccc-7259-0c08-0a40839cf181 (at 10.9.101.19@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8bc97d8800, cur 1574412565 expire 1574412415 last 1574412338 [1580231.246741] Lustre: Skipped 2 previous similar messages [1580307.177362] Lustre: fir-MDT0002: haven't heard from client 17c57638-3006-fe65-ba32-3cab72740d05 (at 10.9.103.18@o2ib4) in 222 seconds. I think it's dead, and I am evicting it. exp ffff9a8ace36ec00, cur 1574412641 expire 1574412491 last 1574412419 [1580307.199325] Lustre: Skipped 1 previous similar message [1580383.209446] Lustre: fir-MDT0002: haven't heard from client 1df35727-5ac0-59c9-4ffe-189b6429fb3c (at 10.9.106.43@o2ib4) in 221 seconds. I think it's dead, and I am evicting it. exp ffff9a8ff4226c00, cur 1574412717 expire 1574412567 last 1574412496 [1581051.214521] Lustre: fir-MDT0002: haven't heard from client 9ece330d-bedb-5433-d5b6-5c2a4b000372 (at 10.9.106.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90def4b400, cur 1574413385 expire 1574413235 last 1574413158 [1581139.654901] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.7@o2ib4) [1581359.201979] Lustre: fir-MDT0002: haven't heard from client 2706daf9-de75-9285-363a-2c5d017b697b (at 10.9.104.42@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bed0f400, cur 1574413693 expire 1574413543 last 1574413466 [1583439.259580] Lustre: fir-MDT0002: haven't heard from client a425d8fa-bfc9-3659-2975-9c284b26a859 (at 10.9.112.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a9194a17800, cur 1574415773 expire 1574415623 last 1574415546 [1584826.290313] Lustre: fir-MDT0002: haven't heard from client fc986e43-3f52-ee44-b77e-4e68f09360fb (at 10.9.108.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b94e6800, cur 1574417160 expire 1574417010 last 1574416933 [1585576.311216] Lustre: fir-MDT0002: haven't heard from client bc49c81c-f999-d708-9be3-04d8f5ab9ed7 (at 10.9.116.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d127acc00, cur 1574417910 expire 1574417760 last 1574417683 [1585576.333094] Lustre: Skipped 2 previous similar messages [1585614.060088] Lustre: fir-MDT0002: Connection restored to (at 10.9.116.9@o2ib4) [1585652.313204] Lustre: fir-MDT0002: haven't heard from client e5776e95-9f22-56f8-6603-c4fbce05c22a (at 10.9.105.10@o2ib4) in 181 seconds. I think it's dead, and I am evicting it. exp ffff9a91b94e2000, cur 1574417986 expire 1574417836 last 1574417805 [1590600.444047] Lustre: fir-MDT0002: haven't heard from client 85504bb2-f7b5-092e-26ed-af66ae870b90 (at 10.9.115.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8de878b400, cur 1574422934 expire 1574422784 last 1574422707 [1591146.463851] Lustre: fir-MDT0002: haven't heard from client d427ec37-74b2-de70-a8af-6fe6162c6690 (at 10.9.101.41@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90ff0dc800, cur 1574423480 expire 1574423330 last 1574423253 [1591192.929215] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.39@o2ib4) [1593195.512801] Lustre: fir-MDT0002: haven't heard from client c2b765b9-142b-87fc-7917-65846ba1bb6c (at 10.9.102.67@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d127a8800, cur 1574425529 expire 1574425379 last 1574425302 [1593195.534764] Lustre: Skipped 1 previous similar message [1593260.032407] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.17@o2ib4) [1594305.552489] Lustre: fir-MDT0002: haven't heard from client b5ed41b3-c37d-3c0f-3eba-6a4af5e41de3 (at 10.9.105.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91ba0d5c00, cur 1574426639 expire 1574426489 last 1574426412 [1594305.574457] Lustre: Skipped 1 previous similar message [1597789.015455] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.61@o2ib4) [1597838.636159] Lustre: fir-MDT0002: haven't heard from client 96bde3ab-5818-fd14-5fdf-db5c1461a7eb (at 10.9.102.61@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a87a065d400, cur 1574430172 expire 1574430022 last 1574429945 [1602204.758029] Lustre: fir-MDT0002: haven't heard from client 087370e2-c2d0-724e-3047-dd85c148ca31 (at 10.9.116.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b5c00000, cur 1574434538 expire 1574434388 last 1574434311 [1607260.885510] Lustre: fir-MDT0002: haven't heard from client b4363ded-b523-41c1-61de-febb05b883a3 (at 10.9.104.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a919d655400, cur 1574439594 expire 1574439444 last 1574439367 [1607795.677577] LustreError: 11-0: fir-MDT0003-osp-MDT0002: operation mds_statfs to node 10.0.10.54@o2ib7 failed: rc = -107 [1607795.688530] LustreError: Skipped 2 previous similar messages [1607795.694374] Lustre: fir-MDT0003-osp-MDT0002: Connection to fir-MDT0003 (at 10.0.10.54@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [1607818.039208] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.0.10.51@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [1607819.040324] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.117.41@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [1607819.057869] LustreError: Skipped 1 previous similar message [1607821.043502] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.20.13@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [1607821.060968] LustreError: Skipped 59 previous similar messages [1607825.049501] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.104.40@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [1607825.067052] LustreError: Skipped 232 previous similar messages [1607833.073390] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.109.38@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [1607833.090936] LustreError: Skipped 412 previous similar messages [1607849.075777] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.18.13@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [1607849.093241] LustreError: Skipped 470 previous similar messages [1607885.326725] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.107.30@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [1607885.344271] LustreError: Skipped 176 previous similar messages [1607902.066209] LNetError: 59266:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [1607902.076559] LNetError: 59266:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.54@o2ib7 (106): c: 8, oc: 0, rc: 8 [1607906.016801] LNetError: 65324:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1607906.957748] LNetError: 65324:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1607907.957624] LNetError: 65324:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1607949.332422] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.26.4@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [1607949.349795] LustreError: Skipped 1170 previous similar messages [1608011.069015] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.54@o2ib7: 1 seconds [1608011.079293] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1608011.091390] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 1 previous similar message [1608017.069170] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.54@o2ib7: 0 seconds [1608017.079460] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1608023.069325] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.54@o2ib7: 0 seconds [1608029.069493] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.54@o2ib7: 0 seconds [1608029.079772] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1608029.091886] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 1 previous similar message [1608086.033807] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.107.30@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [1608086.051352] LustreError: Skipped 1546 previous similar messages [1608108.071501] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.54@o2ib7: 0 seconds [1608108.081769] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1608120.071820] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.54@o2ib7: 0 seconds [1608120.082075] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [1608209.074093] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.54@o2ib7: 0 seconds [1608209.084348] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [1608209.093756] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1608209.105846] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 3 previous similar messages [1608312.076711] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.54@o2ib7: 0 seconds [1608312.086969] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 3 previous similar messages [1608312.096457] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1608312.108544] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 3 previous similar messages [1608349.024787] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.0.10.111@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [1608349.042331] LustreError: Skipped 2791 previous similar messages [1608412.079245] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.54@o2ib7: 0 seconds [1608412.089507] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 3 previous similar messages [1608514.081860] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [1608514.093952] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 7 previous similar messages [1608571.900176] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.54@o2ib7) [1608686.957124] Lustre: fir-MDT0002: haven't heard from client 9f7d3ef3-81b2-c30a-680d-98daf8e9bf59 (at 10.8.21.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d127a9c00, cur 1574441020 expire 1574440870 last 1574440793 [1608722.545538] Lustre: fir-MDT0003-osp-MDT0002: Connection restored to 10.0.10.54@o2ib7 (at 10.0.10.54@o2ib7) [1609006.482963] Lustre: fir-MDD0002: changelog on [1609011.274475] Lustre: fir-MDD0002: changelog off [1609014.123795] Lustre: fir-MDD0002: changelog on [1609055.929918] Lustre: fir-MDT0002: haven't heard from client 05d3010e-4bee-e9fd-f2fb-997d23cd25b8 (at 10.9.106.67@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91ba0d4800, cur 1574441389 expire 1574441239 last 1574441162 [1609055.951886] Lustre: Skipped 1 previous similar message [1609131.930556] Lustre: fir-MDT0002: haven't heard from client 6c3fdbb8-f579-b7a4-c417-b5982815b7ed (at 10.9.108.36@o2ib4) in 205 seconds. I think it's dead, and I am evicting it. exp ffff9a91ba0d1800, cur 1574441465 expire 1574441315 last 1574441260 [1609340.946321] Lustre: fir-MDT0002: haven't heard from client aeebab48-dcb4-dcdb-0a2d-a04885425c7d (at 10.8.24.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5df9377c00, cur 1574441674 expire 1574441524 last 1574441447 [1609621.711966] Lustre: fir-MDT0002: Connection restored to (at 10.9.110.65@o2ib4) [1609716.974020] Lustre: fir-MDT0002: haven't heard from client 5474fa54-fd68-36be-d5b2-3beffc64c778 (at 10.9.112.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a399ec00, cur 1574442050 expire 1574441900 last 1574441823 [1609716.995985] Lustre: Skipped 21 previous similar messages [1609792.967548] Lustre: fir-MDT0002: haven't heard from client 6c37213e-a332-f2ff-4ad7-353dc98b1096 (at 10.8.25.19@o2ib6) in 211 seconds. I think it's dead, and I am evicting it. exp ffff9a89d807cc00, cur 1574442126 expire 1574441976 last 1574441915 [1609805.335566] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.15@o2ib4) [1610071.975679] Lustre: fir-MDT0002: haven't heard from client 67360d0f-602d-e0fd-a763-b6dc0eec238b (at 10.8.27.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91000eb000, cur 1574442405 expire 1574442255 last 1574442178 [1610148.965261] Lustre: fir-MDT0002: haven't heard from client 8b4d67fb-27e2-4d36-6351-6d6e292055b7 (at 10.8.23.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a89d807f000, cur 1574442482 expire 1574442332 last 1574442255 [1610171.680023] Lustre: fir-MDT0002: Connection restored to fc5a3d4d-111a-c396-b61e-9a2380068329 (at 10.8.9.1@o2ib6) [1610279.959224] Lustre: fir-MDT0002: haven't heard from client 216ed9dd-7aee-ce3b-c0ae-292bb9ebb207 (at 10.9.117.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8db0838400, cur 1574442613 expire 1574442463 last 1574442386 [1610686.987952] Lustre: fir-MDT0002: haven't heard from client a32fc2e6-96c3-c64e-712c-3f0f3cc76fbe (at 10.9.117.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b890a400, cur 1574443020 expire 1574442870 last 1574442793 [1610760.125421] Lustre: fir-MDT0002: Connection restored to 0e0bd59d-647b-5d74-c34b-4512d443b11f (at 10.9.108.2@o2ib4) [1610827.307754] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.8@o2ib4) [1610993.114688] Lustre: fir-MDT0002: Connection restored to (at 10.9.110.35@o2ib4) [1611036.484646] Lustre: fir-MDT0002: Connection restored to (at 10.8.20.2@o2ib6) [1611036.491967] Lustre: Skipped 1 previous similar message [1611113.560032] Lustre: fir-MDT0002: Connection restored to (at 10.8.7.4@o2ib6) [1611113.567265] Lustre: Skipped 1 previous similar message [1611162.981997] Lustre: fir-MDT0002: haven't heard from client cdc6e170-5304-ef42-8b66-1812bb8613d5 (at 10.8.7.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a919d657400, cur 1574443496 expire 1574443346 last 1574443269 [1611323.183471] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1611323.190788] Lustre: Skipped 1 previous similar message [1611452.919412] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.28@o2ib6) [1611452.926813] Lustre: Skipped 9 previous similar messages [1611721.785762] Lustre: fir-MDT0002: Connection restored to (at 10.8.31.6@o2ib6) [1611721.793078] Lustre: Skipped 5 previous similar messages [1611737.996658] Lustre: fir-MDT0002: haven't heard from client 023695b3-f11c-67bb-67cb-0822c85f1b10 (at 10.8.30.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a87a065e400, cur 1574444071 expire 1574443921 last 1574443844 [1611738.018559] Lustre: Skipped 1 previous similar message [1612518.018582] Lustre: fir-MDT0002: haven't heard from client c31dfa7e-d06c-7874-3891-84e2d0c6fab5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a51aa9b8800, cur 1574444851 expire 1574444701 last 1574444624 [1612561.985227] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1613788.061457] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.4@o2ib6) [1614071.059866] Lustre: fir-MDT0002: haven't heard from client 2439d9b4-d712-3839-601c-bee8074d0357 (at 10.9.107.48@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8ace36a800, cur 1574446404 expire 1574446254 last 1574446177 [1614071.081828] Lustre: Skipped 1 previous similar message [1615535.898299] Lustre: fir-MDT0002: Connection restored to (at 10.9.107.48@o2ib4) [1615750.109316] Lustre: fir-MDT0002: haven't heard from client 51390574-c509-f8c2-383b-446baae03d6d (at 10.9.0.61@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a9128a98400, cur 1574448083 expire 1574447933 last 1574447856 [1615750.131123] Lustre: Skipped 3 previous similar messages [1615847.969393] Lustre: fir-MDT0002: Connection restored to (at 10.9.109.8@o2ib4) [1616361.594680] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.42@o2ib4) [1616607.292964] Lustre: fir-MDT0002: Connection restored to (at 10.9.105.64@o2ib4) [1616607.300462] Lustre: Skipped 1 previous similar message [1617151.999287] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.28@o2ib4) [1617648.955862] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.3@o2ib4) [1617648.963271] Lustre: Skipped 1 previous similar message [1618217.482533] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.61@o2ib4) [1618364.171127] Lustre: fir-MDT0002: haven't heard from client 89fa98d9-287e-09d0-548e-d634c6a32b59 (at 10.9.116.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90ff0dc000, cur 1574450697 expire 1574450547 last 1574450470 [1619516.215786] Lustre: fir-MDT0002: Connection restored to (at 10.9.116.3@o2ib4) [1619516.223184] Lustre: Skipped 1 previous similar message [1620290.262237] Lustre: fir-MDT0002: haven't heard from client 788bab0a-f5c6-5fcf-82a1-7f60857d1618 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7f613a1800, cur 1574452623 expire 1574452473 last 1574452396 [1623229.292333] Lustre: fir-MDT0002: haven't heard from client daad225e-93c9-2408-d9a9-77757e42fc1e (at 10.9.108.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91a7fcc800, cur 1574455562 expire 1574455412 last 1574455335 [1623353.599259] Lustre: fir-MDT0002: Connection restored to (at 10.9.116.2@o2ib4) [1625094.244490] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.45@o2ib4) [1631424.503789] Lustre: fir-MDT0002: haven't heard from client d25b1b80-7414-e8a5-e9c4-583332d40e57 (at 10.9.117.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8ff4221c00, cur 1574463757 expire 1574463607 last 1574463530 [1632587.539352] Lustre: fir-MDT0002: haven't heard from client 134d357a-64a2-8eb5-6001-b6a8da5ec16f (at 10.9.116.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a54fb1f2c00, cur 1574464920 expire 1574464770 last 1574464693 [1633729.487110] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.14@o2ib4) [1634221.724910] Lustre: fir-MDT0002: Connection restored to 216ed9dd-7aee-ce3b-c0ae-292bb9ebb207 (at 10.9.117.1@o2ib4) [1634235.600660] Lustre: fir-MDT0002: Connection restored to (at 10.9.117.13@o2ib4) [1634252.920105] Lustre: fir-MDT0002: Connection restored to (at 10.9.110.65@o2ib4) [1634261.811009] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.58@o2ib4) [1634301.718285] Lustre: fir-MDT0002: Connection restored to (at 10.8.7.20@o2ib6) [1634317.847979] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.34@o2ib4) [1634411.159354] Lustre: fir-MDT0002: Connection restored to (at 10.9.107.37@o2ib4) [1634495.490590] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.10@o2ib4) [1634495.498081] Lustre: Skipped 1 previous similar message [1634694.143570] Lustre: fir-MDT0002: Connection restored to (at 10.8.24.31@o2ib6) [1634694.150978] Lustre: Skipped 1 previous similar message [1639398.579396] Lustre: fir-MDT0002: Connection restored to (at 10.9.113.7@o2ib4) [1639398.586799] Lustre: Skipped 9 previous similar messages [1639478.296378] Lustre: fir-MDT0002: Connection restored to 2149710b-dbf2-1203-217d-316b04640a56 (at 10.9.115.6@o2ib4) [1639478.306903] Lustre: Skipped 1 previous similar message [1640950.311275] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.43@o2ib4) [1641067.756741] Lustre: fir-MDT0002: haven't heard from client 8163b191-2600-6747-daac-584fd8ae2e09 (at 10.9.101.36@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bed0d400, cur 1574473400 expire 1574473250 last 1574473173 [1641067.778711] Lustre: Skipped 1 previous similar message [1641080.566161] Lustre: fir-MDT0002: Connection restored to (at 10.9.106.72@o2ib4) [1641104.620630] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.43@o2ib4) [1642171.165141] Lustre: fir-MDT0002: Connection restored to (at 10.9.114.8@o2ib4) [1642210.402628] Lustre: fir-MDT0002: Connection restored to (at 10.9.112.16@o2ib4) [1642313.142976] Lustre: fir-MDT0002: Connection restored to (at 10.9.115.3@o2ib4) [1642457.470486] Lustre: fir-MDT0002: Connection restored to (at 10.9.117.21@o2ib4) [1643205.555849] Lustre: fir-MDT0002: Connection restored to (at 10.9.116.6@o2ib4) [1643245.162826] Lustre: fir-MDT0002: Connection restored to (at 10.9.105.11@o2ib4) [1643281.805568] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.1@o2ib4) [1643297.992153] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.20@o2ib4) [1643297.999646] Lustre: Skipped 2 previous similar messages [1643332.732412] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.41@o2ib4) [1643332.739906] Lustre: Skipped 4 previous similar messages [1643408.737774] Lustre: fir-MDT0002: Connection restored to (at 10.9.105.33@o2ib4) [1643408.745278] Lustre: Skipped 8 previous similar messages [1643749.769578] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [1643749.776980] Lustre: Skipped 5 previous similar messages [1643797.829126] Lustre: fir-MDT0002: haven't heard from client ea3e168b-2916-9a39-aeb8-0a16888f6d50 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a68c7f3e000, cur 1574476130 expire 1574475980 last 1574475903 [1643797.851024] Lustre: Skipped 6 previous similar messages [1653808.085946] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [1653865.092043] Lustre: fir-MDT0002: haven't heard from client 78051e84-4ba6-51ef-d535-b90139ca4ad4 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6cb873f400, cur 1574486197 expire 1574486047 last 1574485970 [1657197.179381] Lustre: fir-MDT0002: haven't heard from client 73e3579d-eae3-0a5d-abba-a217983294df (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a65515f6800, cur 1574489529 expire 1574489379 last 1574489302 [1657279.668725] Lustre: fir-MDT0002: Connection restored to (at 10.8.23.14@o2ib6) [1662906.236978] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.15@o2ib4) [1667219.444973] Lustre: fir-MDT0002: haven't heard from client 6c65e857-8719-c424-61f9-32ab04911f04 (at 10.9.101.42@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8317aed000, cur 1574499551 expire 1574499401 last 1574499324 [1669467.571518] Lustre: fir-MDT0002: Connection restored to a402a874-21a6-76c0-04c7-cb9a15009f9d (at 10.9.101.42@o2ib4) [1688582.179677] LNetError: 59267:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [1690670.538352] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.0.10.3@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [1690670.555727] LustreError: Skipped 2737 previous similar messages [1690922.318467] LNet: Service thread pid 59396 was inactive for 200.67s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1690922.335580] Pid: 59396, comm: mdt01_000 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [1690922.345943] Call Trace: [1690922.348588] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1690922.355738] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [1690922.363022] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [1690922.369913] [] osp_md_object_lock+0x162/0x2d0 [osp] [1690922.376655] [] lod_object_lock+0xf3/0x7b0 [lod] [1690922.383079] [] mdd_object_lock+0x3e/0xe0 [mdd] [1690922.389388] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [1690922.396854] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [1690922.403773] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [1690922.410199] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [1690922.416861] [] mdt_reint_rec+0x83/0x210 [mdt] [1690922.423112] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1690922.429854] [] mdt_reint+0x67/0x140 [mdt] [1690922.435779] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [1690922.442897] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1690922.450810] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [1690922.457308] [] kthread+0xd1/0xe0 [1690922.462413] [] ret_from_fork_nospec_begin+0xe/0x21 [1690922.469068] [] 0xffffffffffffffff [1690922.474280] LustreError: dumping log to /tmp/lustre-log.1574523253.59396 [1690966.351606] LNet: Service thread pid 62562 was inactive for 200.52s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1690966.368758] Pid: 62562, comm: mdt00_027 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [1690966.379104] Call Trace: [1690966.381748] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1690966.388886] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [1690966.396181] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [1690966.403031] [] osp_md_object_lock+0x162/0x2d0 [osp] [1690966.409782] [] lod_object_lock+0xf3/0x7b0 [lod] [1690966.416179] [] mdd_object_lock+0x3e/0xe0 [mdd] [1690966.422487] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [1690966.429938] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [1690966.436862] [] mdt_rename_lock+0xbe/0x4b0 [mdt] [1690966.443262] [] mdt_reint_rename+0x2c5/0x2b90 [mdt] [1690966.449933] [] mdt_reint_rec+0x83/0x210 [mdt] [1690966.456159] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1690966.462893] [] mdt_reint+0x67/0x140 [mdt] [1690966.468760] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [1690966.475899] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1690966.483796] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [1690966.490296] [] kthread+0xd1/0xe0 [1690966.495385] [] ret_from_fork_nospec_begin+0xe/0x21 [1690966.502047] [] 0xffffffffffffffff [1690966.507244] LustreError: dumping log to /tmp/lustre-log.1574523297.62562 [1691021.646044] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [1691021.662219] LustreError: 59396:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1574523052, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9a631ba4bf00/0xebb19298b16c5593 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 3 type: IBT flags: 0x1000001000000 nid: local remote: 0x2bdea504ace8aab8 expref: -99 pid: 59396 timeout: 0 lvb_type: 0 [1691021.727688] Lustre: fir-MDT0000-osp-MDT0002: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) [1691065.830194] LustreError: 62562:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1574523096, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9a577bcada00/0xebb19298b18e8f0f lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 3 type: IBT flags: 0x1000001000000 nid: local remote: 0x2bdea504ad1678fa expref: -99 pid: 62562 timeout: 0 lvb_type: 0 [1691316.308670] Lustre: 62789:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9a90f2e06300 x1649292030157216/t0(0) o36->181fb1c0-d783-7b4a-9f82-4887555b1095@10.8.0.68@o2ib6:597/0 lens 656/2888 e 23 to 0 dl 1574523652 ref 2 fl Interpret:/0/0 rc 0/0 [1691322.706475] Lustre: fir-MDT0002: Client 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) reconnecting [1691322.716686] Lustre: fir-MDT0002: Connection restored to 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) [1691360.233804] Lustre: 62896:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9a5eab3d4380 x1648594499288416/t0(0) o36->56ec1425-5381-0678-b174-d4693bd27d63@10.9.106.30@o2ib4:641/0 lens 568/2888 e 21 to 0 dl 1574523696 ref 2 fl Interpret:/0/0 rc 0/0 [1691366.911491] Lustre: fir-MDT0002: Client 56ec1425-5381-0678-b174-d4693bd27d63 (at 10.9.106.30@o2ib4) reconnecting [1691366.921866] Lustre: fir-MDT0002: Connection restored to 56ec1425-5381-0678-b174-d4693bd27d63 (at 10.9.106.30@o2ib4) [1691923.806217] Lustre: fir-MDT0002: Client 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) reconnecting [1691923.816414] Lustre: fir-MDT0002: Connection restored to 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) [1691968.010721] Lustre: fir-MDT0002: Client 56ec1425-5381-0678-b174-d4693bd27d63 (at 10.9.106.30@o2ib4) reconnecting [1691968.021102] Lustre: fir-MDT0002: Connection restored to 56ec1425-5381-0678-b174-d4693bd27d63 (at 10.9.106.30@o2ib4) [1692524.905852] Lustre: fir-MDT0002: Client 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) reconnecting [1692524.916053] Lustre: fir-MDT0002: Connection restored to 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) [1692569.104371] Lustre: fir-MDT0002: Client 56ec1425-5381-0678-b174-d4693bd27d63 (at 10.9.106.30@o2ib4) reconnecting [1692569.114752] Lustre: fir-MDT0002: Connection restored to 56ec1425-5381-0678-b174-d4693bd27d63 (at 10.9.106.30@o2ib4) [1693126.005746] Lustre: fir-MDT0002: Client 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) reconnecting [1693126.015942] Lustre: fir-MDT0002: Connection restored to 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) [1693170.199154] Lustre: fir-MDT0002: Client 56ec1425-5381-0678-b174-d4693bd27d63 (at 10.9.106.30@o2ib4) reconnecting [1693727.106414] Lustre: fir-MDT0002: Client 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) reconnecting [1693727.116615] Lustre: fir-MDT0002: Connection restored to 181fb1c0-d783-7b4a-9f82-4887555b1095 (at 10.8.0.68@o2ib6) [1693727.127062] Lustre: Skipped 1 previous similar message [1693988.576763] LNet: Service thread pid 59396 completed after 3266.85s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [1693988.593199] LNet: Skipped 2 previous similar messages [1708468.302253] LNetError: 59267:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [1710186.549094] Lustre: fir-MDT0002: haven't heard from client 500cb7bd-5524-eaac-a4c4-92909bfd3ce5 (at 10.9.101.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bedf6000, cur 1574542517 expire 1574542367 last 1574542290 [1712561.745763] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.40@o2ib4) [1712561.753256] Lustre: Skipped 1 previous similar message [1721603.354470] perf: interrupt took too long (5090 > 5083), lowering kernel.perf_event_max_sample_rate to 39000 [1731779.494654] LNetError: 59275:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [1771332.112674] Lustre: fir-MDT0002: haven't heard from client 681df957-0ffe-68d6-8961-6bd10a8dfb89 (at 10.9.104.29@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8ace36d800, cur 1574603661 expire 1574603511 last 1574603434 [1789261.076169] Lustre: 62553:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 3: -22 [1789261.577234] Lustre: 62567:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 3: -22 [1789261.589056] Lustre: 62567:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 536 previous similar messages [1789262.576897] Lustre: 59731:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 3: -22 [1789262.588726] Lustre: 59731:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 1031 previous similar messages [1789264.577295] Lustre: 59731:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 3: -22 [1789264.589125] Lustre: 59731:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 1892 previous similar messages [1789268.576869] Lustre: 62558:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 3: -22 [1789268.588694] Lustre: 62558:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 3920 previous similar messages [1796321.746096] Lustre: 97200:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff9a60d4f04800 x1649303198559808/t0(0) o103->c5c1b41e-f3c0-c589-3194-4020139c27d9@10.8.17.26@o2ib6:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [1796321.771880] Lustre: 97200:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 26 previous similar messages [1796322.248378] Lustre: 62533:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff9a57e6b38850 x1650018272694416/t0(0) o103->0e05a585-7a91-6bd2-9b87-c660613c3bf4@10.0.10.3@o2ib7:0/0 lens 3584/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [1796322.274279] Lustre: 62533:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 479 previous similar messages [1807296.517998] LNetError: 59267:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [1807303.322519] Lustre: fir-MDT0002: Client 63176e14-4cab-b518-1a3a-dffe286adc75 (at 10.9.110.32@o2ib4) reconnecting [1807303.332882] Lustre: Skipped 1 previous similar message [1807303.338225] Lustre: fir-MDT0002: Connection restored to (at 10.9.110.32@o2ib4) [1838086.678862] Lustre: fir-MDT0002: Connection restored to (at 10.8.15.8@o2ib6) [1870463.624302] Lustre: fir-MDT0002: haven't heard from client c958ebb4-8b5e-2d38-80fc-9485b1ec2704 (at 10.9.108.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a919b39ec00, cur 1574702790 expire 1574702640 last 1574702563 [1872063.612903] Lustre: fir-MDT0002: Connection restored to (at 10.9.117.21@o2ib4) [1872340.959021] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.15@o2ib4) [1876737.132786] Lustre: fir-MDT0002: Connection restored to (at 10.9.104.29@o2ib4) [1885156.754063] Lustre: fir-MDT0002: Client 0e05a585-7a91-6bd2-9b87-c660613c3bf4 (at 10.0.10.3@o2ib7) reconnecting [1885156.764354] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) [1893188.776496] Lustre: fir-MDT0002: Connection restored to (at 10.9.108.26@o2ib4) [1893848.526934] LNetError: 59267:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [1934574.680997] Lustre: fir-MDT0002: Client 0e05a585-7a91-6bd2-9b87-c660613c3bf4 (at 10.0.10.3@o2ib7) reconnecting [1934574.691225] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) [1934660.663569] Lustre: fir-MDT0002: Client 0e05a585-7a91-6bd2-9b87-c660613c3bf4 (at 10.0.10.3@o2ib7) reconnecting [1934660.673772] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) [1942346.755752] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.0.10.3@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [1942526.679309] LNetError: 59266:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [1942526.689660] LNetError: 59266:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (105): c: 6, oc: 0, rc: 8 [1942647.535267] Lustre: fir-MDT0002: haven't heard from client 0e05a585-7a91-6bd2-9b87-c660613c3bf4 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a52d9021800, cur 1574774972 expire 1574774822 last 1574774745 [1942647.557059] Lustre: Skipped 1 previous similar message [1942827.523167] Lustre: fir-MDT0002: haven't heard from client 2e1bafae-c897-2239-17ee-ebc409fb6d85 (at 10.8.15.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a770dbed400, cur 1574775152 expire 1574775002 last 1574774925 [1970204.963141] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) [1971138.162798] LNetError: 59267:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2041221.065095] Lustre: fir-MDT0002: haven't heard from client 7c1b5364-eff3-9889-5d39-e418d2231d7b (at 10.9.102.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91ba0d6c00, cur 1574873543 expire 1574873393 last 1574873316 [2042532.440353] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.58@o2ib4) [2043144.491689] Lustre: fir-MDT0002: Connection restored to (at 10.9.107.53@o2ib4) [2043424.619456] Lustre: fir-MDT0002: Connection restored to (at 10.9.101.15@o2ib4) [2043463.030113] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.18@o2ib4) [2043502.784268] Lustre: fir-MDT0002: Connection restored to (at 10.9.102.20@o2ib4) [2050249.721557] Lustre: 59742:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1574882564/real 1574882564] req@ffff9a91b208a880 x1649846266454336/t0(0) o101->fir-MDT0000-lwp-MDT0002@10.0.10.51@o2ib7:23/10 lens 456/496 e 0 to 1 dl 1574882571 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [2050249.725567] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [2050249.766266] Lustre: 59742:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [2050249.799991] LustreError: 59436:0:(osd_quota.c:708:osd_declare_inode_qid()) force to ignore quota flags =8 [2050250.466578] LNetError: 59266:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds [2050250.476752] LNetError: 59266:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.51@o2ib7 (6): c: 0, oc: 0, rc: 8 [2050250.489134] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2050250.501224] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 3 previous similar messages [2050252.224630] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [2050256.466748] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 0 seconds [2050256.477009] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 7 previous similar messages [2050277.467276] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 1 seconds [2050277.477538] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 8 previous similar messages [2050285.794492] Lustre: 59328:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1574882598/real 0] req@ffff9a918676d580 x1649846266488528/t0(0) o400->MGC10.0.10.51@o2ib7@10.0.10.51@o2ib7:26/25 lens 224/224 e 0 to 1 dl 1574882605 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [2050285.821922] Lustre: 59328:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [2050285.831839] LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail [2050288.467564] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2050288.479648] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 8 previous similar messages [2050299.535933] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.110.54@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050299.553481] LustreError: Skipped 1 previous similar message [2050300.620193] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.103.66@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050300.637740] LustreError: Skipped 39 previous similar messages [2050302.694985] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.23.20@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050302.712441] LustreError: Skipped 96 previous similar messages [2050306.712416] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.20.8@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050306.729791] LustreError: Skipped 257 previous similar messages [2050311.468149] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 1 seconds [2050311.478407] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 7 previous similar messages [2050314.724009] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.105.27@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050314.741555] LustreError: Skipped 327 previous similar messages [2050332.329591] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.107.12@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050332.347156] LustreError: Skipped 654 previous similar messages [2050333.003505] LustreError: 85979:0:(osd_quota.c:708:osd_declare_inode_qid()) force to ignore quota flags =8 [2050333.003619] LustreError: 62989:0:(osd_quota.c:708:osd_declare_inode_qid()) force to ignore quota flags =8 [2050364.469527] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2050364.481613] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 6 previous similar messages [2050373.728667] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.102.7@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050373.746134] LustreError: Skipped 3 previous similar messages [2050376.469842] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 0 seconds [2050376.480103] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 7 previous similar messages [2050440.331884] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.107.37@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050440.349427] LustreError: Skipped 1383 previous similar messages [2050469.342807] Lustre: fir-MDT0002: haven't heard from client fir-MDT0000-mdtlov_UUID (at 10.0.10.51@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8bc97da000, cur 1574882791 expire 1574882641 last 1574882564 [2050469.363560] Lustre: Skipped 4 previous similar messages [2050499.473019] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2050499.485103] LNetError: 59266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 20 previous similar messages [2050505.473175] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.51@o2ib7: 0 seconds [2050505.483429] LNet: 59266:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 18 previous similar messages [2050574.437700] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.102.7@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050574.455156] LustreError: Skipped 1389 previous similar messages [2050835.511744] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.102.34@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2050835.529288] LustreError: Skipped 2779 previous similar messages [2050989.359529] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.51@o2ib7) [2051013.372466] Lustre: Evicted from MGS (at MGC10.0.10.51@o2ib7_0) after server handle changed from 0x2bdea4ee9eb0a9ce to 0x9afea48ca36dcd [2051013.384946] Lustre: MGC10.0.10.51@o2ib7: Connection restored to MGC10.0.10.51@o2ib7_0 (at 10.0.10.51@o2ib7) [2051038.461152] LustreError: 167-0: fir-MDT0000-lwp-MDT0002: This client was evicted by fir-MDT0000; in progress operations using this service will fail. [2051038.485814] Lustre: fir-MDT0000-lwp-MDT0002: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) [2051449.175824] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_statfs to node 10.0.10.51@o2ib7 failed: rc = -107 [2051449.186780] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [2051451.085108] LustreError: 11-0: fir-MDT0000-lwp-MDT0002: operation quota_acquire to node 10.0.10.51@o2ib7 failed: rc = -107 [2051451.096328] Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [2051466.444706] Lustre: Failing over fir-MDT0002 [2051466.517218] Lustre: fir-MDT0002: Not available for connect from 10.9.112.2@o2ib4 (stopping) [2051466.742781] LustreError: 96273:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.27.27@o2ib6 arrived at 1574883788 with bad export cookie 16983516761410369393 [2051466.758530] LustreError: 96273:0:(ldlm_lock.c:2710:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0002_UUID lock: ffff9a59a44cf740/0xebb192a65a0c2b45 lrc: 3/0,0 mode: PR/PR res: [0x2c00335e1:0x6885:0x0].0x0 bits 0x1b/0x0 rrc: 5 type: IBT flags: 0x40200000000000 nid: 10.8.27.27@o2ib6 remote: 0xd4ff6a9df5ab79e expref: 282 pid: 62553 timeout: 0 lvb_type: 0 [2051467.020753] Lustre: fir-MDT0002: Not available for connect from 10.9.103.44@o2ib4 (stopping) [2051467.029365] Lustre: Skipped 25 previous similar messages [2051467.621777] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.106.56@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2051467.628166] LustreError: 62533:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.8.25@o2ib6 arrived at 1574883789 with bad export cookie 16983516761410370506 [2051467.628169] LustreError: 62533:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 7 previous similar messages [2051467.628184] LustreError: 62533:0:(ldlm_lock.c:2710:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0002_UUID lock: ffff9a6d35815a00/0xebb192a65a0f44f3 lrc: 3/0,0 mode: PR/PR res: [0x2c0032247:0xe81e:0x0].0x0 bits 0x1b/0x0 rrc: 4 type: IBT flags: 0x40200000000000 nid: 10.8.8.25@o2ib6 remote: 0x1b6f555e8646c681 expref: 1087 pid: 62893 timeout: 0 lvb_type: 0 [2051467.628186] LustreError: 62533:0:(ldlm_lock.c:2710:ldlm_lock_dump_handle()) Skipped 5 previous similar messages [2051467.706959] LustreError: Skipped 2435 previous similar messages [2051468.035526] Lustre: fir-MDT0002: Not available for connect from 10.9.108.55@o2ib4 (stopping) [2051468.044148] Lustre: Skipped 43 previous similar messages [2051468.720604] LustreError: 108512:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.9.104.21@o2ib4 arrived at 1574883790 with bad export cookie 16983516761410368791 [2051468.736505] LustreError: 108512:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 5 previous similar messages [2051468.746787] LustreError: 108512:0:(ldlm_lock.c:2710:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0002_UUID lock: ffff9a57224b6300/0xebb192a659ea2404 lrc: 3/0,0 mode: CR/CR res: [0x2c0033ed7:0x13d40:0x0].0x0 bits 0x9/0x0 rrc: 20 type: IBT flags: 0x40200000000000 nid: 10.9.104.21@o2ib4 remote: 0xbcdf5a6360e9a7c7 expref: 481 pid: 62893 timeout: 0 lvb_type: 0 [2051468.778639] LustreError: 108512:0:(ldlm_lock.c:2710:ldlm_lock_dump_handle()) Skipped 2 previous similar messages [2051470.041266] Lustre: fir-MDT0002: Not available for connect from 10.9.117.5@o2ib4 (stopping) [2051470.049805] Lustre: Skipped 96 previous similar messages [2051471.170215] LustreError: 108512:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.9.108.24@o2ib4 arrived at 1574883792 with bad export cookie 16983516761410371878 [2051471.186118] LustreError: 108512:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 8 previous similar messages [2051471.373498] LustreError: 96273:0:(ldlm_lock.c:2710:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0002_UUID lock: ffff9a680608f740/0xebb192a65a2b974f lrc: 3/0,0 mode: PR/PR res: [0x2c0034055:0xaca7:0x0].0x0 bits 0x20/0x0 rrc: 4 type: IBT flags: 0x40200000000000 nid: 10.8.27.35@o2ib6 remote: 0x84ff9c498dac6bb6 expref: 15513 pid: 62870 timeout: 0 lvb_type: 0 [2051471.405276] LustreError: 96273:0:(ldlm_lock.c:2710:ldlm_lock_dump_handle()) Skipped 4 previous similar messages [2051472.626592] Lustre: 42589:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1574883788/real 1574883788] req@ffff9a6316a09680 x1649846267298992/t0(0) o9->fir-OST0008-osc-MDT0002@10.0.10.101@o2ib7:28/4 lens 224/224 e 0 to 1 dl 1574883794 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [2051474.045758] Lustre: fir-MDT0002: Not available for connect from 10.8.31.8@o2ib6 (stopping) [2051474.054227] Lustre: Skipped 284 previous similar messages [2051475.494852] LustreError: 62533:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.9.114.8@o2ib4 arrived at 1574883797 with bad export cookie 16983516850135948496 [2051475.510593] LustreError: 62533:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 18 previous similar messages [2051476.208844] LustreError: 96273:0:(ldlm_lock.c:2710:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0002_UUID lock: ffff9a72fb5e7740/0xebb192a64a7ec6cb lrc: 3/0,0 mode: PR/PR res: [0x2c0032d2d:0x77c2:0x0].0x0 bits 0x1b/0x0 rrc: 9 type: IBT flags: 0x40200000000000 nid: 10.9.102.21@o2ib4 remote: 0x65dcc84cc79a9eba expref: 278 pid: 62874 timeout: 0 lvb_type: 0 [2051476.240521] LustreError: 96273:0:(ldlm_lock.c:2710:ldlm_lock_dump_handle()) Skipped 4 previous similar messages [2051478.715436] Lustre: 42589:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1574883794/real 1574883794] req@ffff9a5eab3d0900 x1649846267301504/t0(0) o9->fir-OST001d-osc-MDT0002@10.0.10.106@o2ib7:28/4 lens 224/224 e 0 to 1 dl 1574883800 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [2051482.059001] Lustre: fir-MDT0002: Not available for connect from 10.9.102.35@o2ib4 (stopping) [2051482.067618] Lustre: Skipped 454 previous similar messages [2051484.743587] Lustre: 42589:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1574883800/real 1574883800] req@ffff9a6559dede80 x1649846267303504/t0(0) o9->fir-OST001e-osc-MDT0002@10.0.10.105@o2ib7:28/4 lens 224/224 e 0 to 1 dl 1574883806 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [2051484.887432] LustreError: 62454:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.6@o2ib4 arrived at 1574883806 with bad export cookie 16983516761410370394 [2051484.903159] LustreError: 62454:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 19 previous similar messages [2051490.775847] LustreError: 59578:0:(osp_precreate.c:656:osp_precreate_send()) fir-OST002b-osc-MDT0002: can't precreate: rc = -5 [2051490.787326] LustreError: 59578:0:(osp_precreate.c:1312:osp_precreate_thread()) fir-OST002b-osc-MDT0002: cannot precreate objects: rc = -5 [2051496.775893] Lustre: 42589:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1574883812/real 1574883812] req@ffff9a6dc6a18d80 x1649846267306160/t0(0) o9->fir-OST002b-osc-MDT0002@10.0.10.108@o2ib7:28/4 lens 224/224 e 0 to 1 dl 1574883818 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [2051496.804269] Lustre: 42589:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [2051498.936693] Lustre: fir-MDT0002: Not available for connect from 10.9.109.65@o2ib4 (stopping) [2051498.945310] Lustre: Skipped 461 previous similar messages [2051502.843816] LustreError: 42589:0:(osp_object.c:594:osp_attr_get()) fir-MDT0000-osp-MDT0002:osp_attr_get update error [0x20000000a:0x0:0x0]: rc = -108 [2051502.857371] LustreError: 42589:0:(llog_cat.c:444:llog_cat_close()) fir-MDT0000-osp-MDT0002: failure destroying log during cleanup: rc = -108 [2051508.913443] Lustre: server umount fir-MDT0002 complete [2051544.926824] LNetError: 107482:0:(o2iblnd_cb.c:2495:kiblnd_passive_connect()) Can't accept conn from 10.0.10.202@o2ib7 on NA (ib0:1:10.0.10.53): bad dst nid 10.0.10.53@o2ib7 [2051545.505574] LNetError: 107482:0:(o2iblnd_cb.c:2495:kiblnd_passive_connect()) Can't accept conn from 10.0.10.211@o2ib7 on NA (ib0:1:10.0.10.53): bad dst nid 10.0.10.53@o2ib7 [2051545.521143] LNetError: 107482:0:(o2iblnd_cb.c:2495:kiblnd_passive_connect()) Skipped 15 previous similar messages [2051546.919187] LNet: Removed LNI 10.0.10.53@o2ib7 [2052155.368209] LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 [2052155.375967] alg: No test for adler32 (adler32-zlib) [2052156.210716] Lustre: Lustre: Build Version: 2.12.3_2_gb033996 [2052156.342828] LNet: 43200:0:(config.c:1627:lnet_inet_enumerate()) lnet: Ignoring interface em2: it's down [2052156.352575] LNet: Using FastReg for registration [2052156.370948] LNet: Added LNI 10.0.10.53@o2ib7 [8/256/0/180] [2052233.941318] LDISKFS-fs (dm-0): file extents enabled, maximum tree depth=5 [2052234.035489] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [2052234.656684] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.21.27@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2052234.674144] LustreError: Skipped 2 previous similar messages [2052234.746379] Lustre: fir-MDT0002: Not available for connect from 10.9.106.35@o2ib4 (not set up) [2052235.269133] Lustre: fir-MDT0002: Not available for connect from 10.8.0.82@o2ib6 (not set up) [2052235.277749] Lustre: Skipped 21 previous similar messages [2052235.439661] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.106.6@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2052235.457143] LustreError: Skipped 3 previous similar messages [2052236.287023] Lustre: fir-MDT0002: Not available for connect from 10.9.108.70@o2ib4 (not set up) [2052236.295805] Lustre: Skipped 32 previous similar messages [2052236.455603] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.8.25@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2052236.472992] LustreError: Skipped 6 previous similar messages [2052236.956574] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_connect to node 10.0.10.51@o2ib7 failed: rc = -114 [2052237.188340] Lustre: fir-MDT0002: Imperative Recovery not enabled, recovery window 300-900 [2052237.392116] Lustre: fir-MDD0002: changelog on [2052237.397857] Lustre: fir-MDT0002: in recovery but waiting for the first client to connect [2052237.433937] Lustre: fir-MDT0002: Will be in recovery for at least 5:00, or until 1288 clients reconnect [2052238.435562] Lustre: fir-MDT0002: Connection restored to (at 10.8.26.1@o2ib6) [2052238.442895] Lustre: Skipped 13 previous similar messages [2052238.683944] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.102.22@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2052238.701487] LustreError: Skipped 9 previous similar messages [2052238.944704] Lustre: fir-MDT0002: Connection restored to 9db57ad7-cf1f-fbcb-3def-7638c45029a7 (at 10.9.106.46@o2ib4) [2052238.955329] Lustre: Skipped 32 previous similar messages [2052239.966196] Lustre: fir-MDT0002: Connection restored to 7b82293a-73f8-138a-13e4-d48833d3398a (at 10.9.101.38@o2ib4) [2052239.976808] Lustre: Skipped 25 previous similar messages [2052242.025636] Lustre: fir-MDT0002: Connection restored to 5ffd693b-b56d-f177-b282-8dc96b21253f (at 10.9.102.38@o2ib4) [2052242.036244] Lustre: Skipped 203 previous similar messages [2052244.134740] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.3@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [2052244.152110] LustreError: Skipped 12 previous similar messages [2052246.160959] Lustre: fir-MDT0002: Connection restored to ba05b1e3-8e67-ac94-d8b7-561ea162dd20 (at 10.9.105.40@o2ib4) [2052246.171572] Lustre: Skipped 310 previous similar messages [2052253.165542] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.102@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [2052253.183087] LustreError: Skipped 77 previous similar messages [2052254.183137] Lustre: fir-MDT0002: Connection restored to 34e4932f-aa61-0617-6986-8c885c948b7a (at 10.9.106.32@o2ib4) [2052254.193747] Lustre: Skipped 152 previous similar messages [2052262.237182] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [2052262.248269] LustreError: Skipped 78 previous similar messages [2052270.216536] Lustre: fir-MDT0002: Connection restored to e69c0ae9-d9c9-3930-df16-60df362bd9fc (at 10.8.24.25@o2ib6) [2052270.227058] Lustre: Skipped 515 previous similar messages [2052272.449021] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.0.10.101@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [2052272.466567] LustreError: Skipped 26 previous similar messages [2052287.261813] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [2052287.272857] LustreError: Skipped 19 previous similar messages [2052297.406832] Lustre: 43904:0:(ldlm_lib.c:1765:extend_recovery_timer()) fir-MDT0002: extended recovery timer reaching hard limit: 900, extend: 1 [2052311.835169] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.51@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [2052311.852620] LustreError: Skipped 48 previous similar messages [2052312.350601] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [2052312.361665] LustreError: Skipped 19 previous similar messages [2052317.741529] Lustre: fir-MDT0002: Connection restored to 10.0.10.107@o2ib7 (at 10.0.10.107@o2ib7) [2052317.750497] Lustre: Skipped 100 previous similar messages [2052335.558062] Lustre: 43904:0:(ldlm_lib.c:1765:extend_recovery_timer()) fir-MDT0002: extended recovery timer reaching hard limit: 900, extend: 1 [2052335.571013] Lustre: 43904:0:(ldlm_lib.c:1765:extend_recovery_timer()) Skipped 1 previous similar message [2052336.923923] Lustre: 43904:0:(ldlm_lib.c:1765:extend_recovery_timer()) fir-MDT0002: extended recovery timer reaching hard limit: 900, extend: 1 [2052336.936871] Lustre: 43904:0:(ldlm_lib.c:1765:extend_recovery_timer()) Skipped 3 previous similar messages [2052337.439245] LustreError: 11-0: fir-OST0009-osc-MDT0002: operation ost_connect to node 10.0.10.102@o2ib7 failed: rc = -16 [2052337.450297] LustreError: Skipped 19 previous similar messages [2052338.262400] Lustre: fir-MDT0002: Recovery over after 1:40, of 1291 clients 1291 recovered and 0 were evicted. [2052362.527654] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [2052362.538701] LustreError: Skipped 30 previous similar messages [2052387.616317] LustreError: 11-0: fir-OST0009-osc-MDT0002: operation ost_connect to node 10.0.10.102@o2ib7 failed: rc = -16 [2052387.627358] LustreError: Skipped 19 previous similar messages [2052437.793865] LustreError: 11-0: fir-OST0025-osc-MDT0002: operation ost_connect to node 10.0.10.108@o2ib7 failed: rc = -16 [2052437.804913] LustreError: Skipped 39 previous similar messages [2052513.059870] LustreError: 11-0: fir-OST0000-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [2052513.070913] LustreError: Skipped 45 previous similar messages [2052519.243052] Lustre: fir-OST002e-osc-MDT0002: Connection restored to 10.0.10.107@o2ib7 (at 10.0.10.107@o2ib7) [2052519.253063] Lustre: Skipped 30 previous similar messages [2052564.373001] Lustre: fir-MDT0002: haven't heard from client 46317fb9-026d-9c3c-9b09-e0eda233e5d9 (at 10.9.112.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a90e35ea400, cur 1574884886 expire 1574884736 last 1574884659 [2053218.734783] Lustre: fir-OST002b-osc-MDT0002: Connection restored to 10.0.10.108@o2ib7 (at 10.0.10.108@o2ib7) [2053791.391277] Lustre: fir-MDT0002: haven't heard from client 06d17a9f-df4a-c46f-3ae9-bef8fc59e075 (at 10.8.27.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6784a44c00, cur 1574886113 expire 1574885963 last 1574885886 [2054246.399029] LNetError: 43254:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [2054246.409378] LNetError: 43254:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (107): c: 7, oc: 0, rc: 8 [2054560.550464] Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) [2063472.644929] Lustre: fir-MDT0002: haven't heard from client 130574b8-92d0-de12-6ad1-04a5ca1ac364 (at 10.9.103.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8e82703800, cur 1574895794 expire 1574895644 last 1574895567 [2067496.749420] Lustre: fir-MDT0002: haven't heard from client 84c0915e-d422-31a4-8bd0-7d29dad31210 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d396eec00, cur 1574899818 expire 1574899668 last 1574899591 [2067771.186408] Lustre: fir-MDT0002: Connection restored to 46317fb9-026d-9c3c-9b09-e0eda233e5d9 (at 10.9.112.11@o2ib4) [2068252.477934] Lustre: fir-MDT0002: Connection restored to 130574b8-92d0-de12-6ad1-04a5ca1ac364 (at 10.9.103.69@o2ib4) [2068754.205023] Lustre: fir-MDT0002: Connection restored to (at 10.8.27.1@o2ib6) [2091096.373134] Lustre: fir-MDT0002: haven't heard from client 9ffa28bc-5939-f8d4-9315-34f5c7c877e9 (at 10.9.103.72@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b94e6400, cur 1574923417 expire 1574923267 last 1574923190 [2146187.683239] Lustre: fir-MDT0002: Connection restored to bfffc06f-f47f-1054-b65c-dc5c6c0b83de (at 10.8.9.2@o2ib6) [2149459.200177] LNetError: 43255:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2151667.406263] LNetError: 43263:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2151752.844651] LNetError: 43257:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [2151752.857415] LNetError: 43257:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 3 previous similar messages [2151759.339821] Lustre: fir-MDT0002: Client 262affea-6f08-6e05-c2e8-d629eeb38f83 (at 10.9.107.19@o2ib4) reconnecting [2151759.349400] Lustre: fir-MDT0002: Connection restored to 65516a3b-056a-38c2-7e7a-c21651b8c1f8 (at 10.9.107.64@o2ib4) [2151759.360845] Lustre: Skipped 2 previous similar messages [2151759.848801] Lustre: fir-MDT0002: Client 88fa221d-7176-1083-8a20-d837893f0e22 (at 10.9.106.8@o2ib4) reconnecting [2151759.859090] Lustre: Skipped 19 previous similar messages [2151759.864604] Lustre: fir-MDT0002: Connection restored to 88fa221d-7176-1083-8a20-d837893f0e22 (at 10.9.106.8@o2ib4) [2151759.875123] Lustre: Skipped 21 previous similar messages [2152562.088765] LNetError: 43258:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2152562.101224] LNetError: 43258:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 18 previous similar messages [2152566.568984] LNetError: 43256:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2152671.510220] LNetError: 43259:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2162497.793788] LNetError: 43255:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2162549.503882] LNetError: 43258:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2162556.649084] LNetError: 43257:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2163753.038708] Lustre: 44265:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1574996064/real 1574996064] req@ffff9a68627cba80 x1651386230395408/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1574996071 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [2163760.065891] Lustre: 44265:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1574996071/real 1574996071] req@ffff9a68627cba80 x1651386230395408/t0(0) o104->fir-MDT0002@10.8.0.65@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1574996078 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2163844.913148] Lustre: fir-MDT0002: Client a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) reconnecting [2163844.923347] Lustre: fir-MDT0002: Connection restored to a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) [2164071.260061] Lustre: fir-MDT0002: haven't heard from client a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b9448c00, cur 1574996390 expire 1574996240 last 1574996163 [2164217.489131] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2164217.506519] LustreError: Skipped 1 previous similar message [2164231.129037] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2164256.216276] Lustre: fir-MDT0002: Connection restored to a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) [2164323.494609] Lustre: fir-MDT0002: Client a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) reconnecting [2164323.504811] Lustre: fir-MDT0002: Connection restored to a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) [2164457.504504] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2164550.271888] Lustre: fir-MDT0002: haven't heard from client a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7386598000, cur 1574996869 expire 1574996719 last 1574996642 [2164551.985546] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2164579.781689] Lustre: fir-MDT0002: Connection restored to a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) [2164655.206826] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2164707.319731] Lustre: fir-MDT0002: Client a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) reconnecting [2164707.329927] Lustre: fir-MDT0002: Connection restored to a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) [2166319.292570] Lustre: fir-MDT0002: Client a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) reconnecting [2166319.302767] Lustre: fir-MDT0002: Connection restored to a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) [2166349.007206] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.0.65@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2166545.325771] Lustre: fir-MDT0002: haven't heard from client a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6c2c33e400, cur 1574998864 expire 1574998714 last 1574998637 [2171529.389503] Lustre: 68819:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575003840/real 1575003840] req@ffff9a764f7fb180 x1651386256024960/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575003847 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [2171545.248920] Lustre: 43941:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575003855/real 1575003855] req@ffff9a8198bb5580 x1651386256030000/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575003863 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [2171553.276130] Lustre: 43941:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575003863/real 1575003863] req@ffff9a8198bb5580 x1651386256030000/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575003871 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2171561.303339] Lustre: 43941:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575003871/real 1575003871] req@ffff9a8198bb5580 x1651386256030000/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575003879 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2171569.330552] Lustre: 43941:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575003879/real 1575003879] req@ffff9a8198bb5580 x1651386256030000/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575003887 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2171577.357762] Lustre: 43941:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575003887/real 1575003887] req@ffff9a8198bb5580 x1651386256030000/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575003895 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2171593.385179] Lustre: 43941:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575003903/real 1575003903] req@ffff9a8198bb5580 x1651386256030000/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575003911 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2171593.412521] Lustre: 43941:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [2171601.447383] LNetError: 43254:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds [2171601.457555] LNetError: 43254:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.202@o2ib7 (0): c: 0, oc: 3, rc: 8 [2171601.470005] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2171601.717630] Lustre: fir-MDT0002: Client a76cca61-9f31-8203-f0f0-b5ac7feacee3 (at 10.8.24.34@o2ib6) reconnecting [2171601.727923] Lustre: fir-MDT0002: Connection restored to a76cca61-9f31-8203-f0f0-b5ac7feacee3 (at 10.8.24.34@o2ib6) [2171602.247277] Lustre: fir-MDT0002: Client 9dc649f7-c9b7-da30-4b72-3787515e419a (at 10.8.8.23@o2ib6) reconnecting [2171602.257475] Lustre: Skipped 2 previous similar messages [2171602.262893] Lustre: fir-MDT0002: Connection restored to 9dc649f7-c9b7-da30-4b72-3787515e419a (at 10.8.8.23@o2ib6) [2171602.273339] Lustre: Skipped 2 previous similar messages [2171625.423015] Lustre: 43941:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575003935/real 1575003935] req@ffff9a8198bb5580 x1651386256030000/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575003943 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2171625.450373] Lustre: 43941:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [2171641.460442] LustreError: 43941:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.0.66@o2ib6) failed to reply to blocking AST (req@ffff9a8198bb5580 x1651386256030000 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6460381440/0xdeacf8e4a6e29dc8 lrc: 4/0,0 mode: PR/PR res: [0x2c0033ea3:0x9db5:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.66@o2ib6 remote: 0xc21c4648799a6cf3 expref: 2997146 pid: 44296 timeout: 2171676 lvb_type: 0 [2171641.503652] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.0.66@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [2171641.516301] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.8.0.66@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a6460381440/0xdeacf8e4a6e29dc8 lrc: 3/0,0 mode: PR/PR res: [0x2c0033ea3:0x9db5:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.66@o2ib6 remote: 0xc21c4648799a6cf3 expref: 2997147 pid: 44296 timeout: 0 lvb_type: 0 [2171651.422748] LustreError: 44388:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575003969 with bad export cookie 16045473213564129133 [2171651.438385] LustreError: 44388:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 8 previous similar messages [2171651.926026] LustreError: 70094:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575003970 with bad export cookie 16045473213564129133 [2171651.941665] LustreError: 70094:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 335 previous similar messages [2171652.934894] LustreError: 44236:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575003971 with bad export cookie 16045473213564129133 [2171652.950530] LustreError: 44236:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 97 previous similar messages [2171654.938566] LustreError: 69528:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575003973 with bad export cookie 16045473213564129133 [2171654.954203] LustreError: 69528:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 202 previous similar messages [2171658.947328] LustreError: 50492:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575003977 with bad export cookie 16045473213564129133 [2171658.962966] LustreError: 50492:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 404 previous similar messages [2171666.952628] LustreError: 69528:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575003985 with bad export cookie 16045473213564129133 [2171666.968259] LustreError: 69528:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 639 previous similar messages [2171682.959523] LustreError: 69478:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575004001 with bad export cookie 16045473213564129133 [2171682.975162] LustreError: 69478:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 1269 previous similar messages [2171714.978790] LustreError: 69482:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575004033 with bad export cookie 16045473213564129133 [2171714.994429] LustreError: 69482:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 2768 previous similar messages [2171737.534959] LNet: Service thread pid 43941 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [2171737.552068] Pid: 43941, comm: mdt02_007 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [2171737.562418] Call Trace: [2171737.565060] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [2171737.572206] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [2171737.579575] [] mdt_object_local_lock+0x438/0xb20 [mdt] [2171737.586605] [] mdt_object_lock_internal+0x70/0x360 [mdt] [2171737.593778] [] mdt_object_lock+0x20/0x30 [mdt] [2171737.600096] [] mdt_reint_open+0x106a/0x3240 [mdt] [2171737.606666] [] mdt_reint_rec+0x83/0x210 [mdt] [2171737.612903] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [2171737.619641] [] mdt_intent_open+0x82/0x3a0 [mdt] [2171737.626050] [] mdt_intent_policy+0x435/0xd80 [mdt] [2171737.632700] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [2171737.639635] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [2171737.646915] [] tgt_enqueue+0x62/0x210 [ptlrpc] [2171737.653288] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [2171737.660400] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [2171737.668305] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [2171737.674805] [] kthread+0xd1/0xe0 [2171737.679893] [] ret_from_fork_nospec_begin+0xe/0x21 [2171737.686547] [] 0xffffffffffffffff [2171737.691743] LustreError: dumping log to /tmp/lustre-log.1575004056.43941 [2171778.989905] LustreError: 44236:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575004097 with bad export cookie 16045473213564129133 [2171779.005541] LustreError: 44236:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 7123 previous similar messages [2171906.995658] LustreError: 70374:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.66@o2ib6 arrived at 1575004225 with bad export cookie 16045473213564129133 [2171907.011290] LustreError: 70374:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 20273 previous similar messages [2171939.366898] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.66@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2171941.523359] LustreError: 43941:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1575003960, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9a7b8c8257c0/0xdeacf8e4ababcdfd lrc: 3/0,1 mode: --/CW res: [0x2c0033ea3:0x9db5:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 43941 timeout: 0 lvb_type: 0 [2171941.562972] LustreError: dumping log to /tmp/lustre-log.1575004260.43941 [2171961.843818] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.0.66@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2172047.478165] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.66@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2172124.597588] Lustre: fir-MDT0002: Connection restored to fb9a2d5e-e9b3-4fb9-b988-9954fcfb0920 (at 10.8.0.66@o2ib6) [2172124.608020] Lustre: Skipped 3 previous similar messages [2172132.455440] Lustre: 44369:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9a61b9f80d80 x1649327162211904/t0(0) o101->fcc1cf4a-a103-6faa-4cd9-4b4bc27b0479@10.9.101.58@o2ib4:465/0 lens 1800/3288 e 24 to 0 dl 1575004455 ref 2 fl Interpret:/0/0 rc 0/0 [2172138.943718] Lustre: fir-MDT0002: Client fcc1cf4a-a103-6faa-4cd9-4b4bc27b0479 (at 10.9.101.58@o2ib4) reconnecting [2172138.954067] Lustre: Skipped 3 previous similar messages [2172138.959493] Lustre: fir-MDT0002: Connection restored to fcc1cf4a-a103-6faa-4cd9-4b4bc27b0479 (at 10.9.101.58@o2ib4) [2172227.490313] LNet: Service thread pid 43941 completed after 690.22s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [2172246.538398] Lustre: 44336:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575004557/real 1575004557] req@ffff9a53810be780 x1651386256265088/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575004565 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [2172246.565754] Lustre: 44336:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [2172342.578913] LustreError: 44336:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.0.66@o2ib6) failed to reply to blocking AST (req@ffff9a53810be780 x1651386256265088 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a71b4651b00/0xdeacf8e4a6d98808 lrc: 4/0,0 mode: PR/PR res: [0x2c0033ea3:0x9dbf:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.66@o2ib6 remote: 0xc21c4648798ee60a expref: 972322 pid: 68769 timeout: 2172377 lvb_type: 0 [2172342.622029] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.0.66@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [2172342.634667] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 104s: evicting client at 10.8.0.66@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a71b4651b00/0xdeacf8e4a6d98808 lrc: 3/0,0 mode: PR/PR res: [0x2c0033ea3:0x9dbf:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.66@o2ib6 remote: 0xc21c4648798ee60a expref: 972242 pid: 68769 timeout: 0 lvb_type: 0 [2172351.473435] Lustre: fir-MDT0002: haven't heard from client fb9a2d5e-e9b3-4fb9-b988-9954fcfb0920 (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a75ef690400, cur 1575004670 expire 1575004520 last 1575004443 [2172438.993390] LNet: Service thread pid 44336 was inactive for 200.44s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [2172439.010501] Pid: 44336, comm: mdt00_033 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [2172439.020843] Call Trace: [2172439.023481] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [2172439.030626] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [2172439.037995] [] mdt_object_local_lock+0x438/0xb20 [mdt] [2172439.045011] [] mdt_object_lock_internal+0x70/0x360 [mdt] [2172439.052188] [] mdt_object_lock+0x20/0x30 [mdt] [2172439.058499] [] mdt_reint_open+0x106a/0x3240 [mdt] [2172439.065068] [] mdt_reint_rec+0x83/0x210 [mdt] [2172439.071306] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [2172439.078066] [] mdt_intent_open+0x82/0x3a0 [mdt] [2172439.084479] [] mdt_intent_policy+0x435/0xd80 [mdt] [2172439.091139] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [2172439.098072] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [2172439.105354] [] tgt_enqueue+0x62/0x210 [ptlrpc] [2172439.111704] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [2172439.118813] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [2172439.126714] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [2172439.133215] [] kthread+0xd1/0xe0 [2172439.138305] [] ret_from_fork_nospec_begin+0xe/0x21 [2172439.144973] [] 0xffffffffffffffff [2172439.150173] LustreError: dumping log to /tmp/lustre-log.1575004757.44336 [2172481.376463] Lustre: fir-MDT0002: Connection restored to a24a1efe-3bf8-fd3c-c065-399cdd330bf9 (at 10.8.0.65@o2ib6) [2172494.443555] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.0.66@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2172514.854236] LNet: Service thread pid 44336 completed after 276.30s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [2172528.957727] Lustre: 44587:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575004839/real 1575004839] req@ffff9a6e35d62400 x1651386256361504/t0(0) o104->fir-MDT0002@10.8.0.66@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575004847 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [2172528.985065] Lustre: 44587:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 12 previous similar messages [2172624.998244] LustreError: 44587:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.0.66@o2ib6) failed to reply to blocking AST (req@ffff9a6e35d62400 x1651386256361504 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a7866d35340/0xdeacf8e4a6e5cb3b lrc: 4/0,0 mode: PR/PR res: [0x2c0033ea3:0x9dc4:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.66@o2ib6 remote: 0xc21c4648799dcfcd expref: 594699 pid: 44250 timeout: 2172660 lvb_type: 0 [2172625.041361] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.0.66@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [2172625.053982] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 104s: evicting client at 10.8.0.66@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9a7866d35340/0xdeacf8e4a6e5cb3b lrc: 3/0,0 mode: PR/PR res: [0x2c0033ea3:0x9dc4:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.0.66@o2ib6 remote: 0xc21c4648799dcfcd expref: 594632 pid: 44250 timeout: 0 lvb_type: 0 [2172716.371978] Lustre: fir-MDT0002: Connection restored to fb9a2d5e-e9b3-4fb9-b988-9954fcfb0920 (at 10.8.0.66@o2ib6) [2172717.252200] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.0.66@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2172721.624728] LNet: Service thread pid 44587 was inactive for 200.66s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [2172721.641841] Pid: 44587, comm: mdt01_073 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [2172721.652187] Call Trace: [2172721.654833] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [2172721.661961] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [2172721.669329] [] mdt_object_local_lock+0x438/0xb20 [mdt] [2172721.676330] [] mdt_object_lock_internal+0x70/0x360 [mdt] [2172721.683508] [] mdt_object_lock+0x20/0x30 [mdt] [2172721.689816] [] mdt_reint_open+0x106a/0x3240 [mdt] [2172721.696385] [] mdt_reint_rec+0x83/0x210 [mdt] [2172721.702608] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [2172721.709352] [] mdt_intent_open+0x82/0x3a0 [mdt] [2172721.715747] [] mdt_intent_policy+0x435/0xd80 [mdt] [2172721.722405] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [2172721.729346] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [2172721.736626] [] tgt_enqueue+0x62/0x210 [ptlrpc] [2172721.742970] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [2172721.750085] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [2172721.757973] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [2172721.764474] [] kthread+0xd1/0xe0 [2172721.769560] [] ret_from_fork_nospec_begin+0xe/0x21 [2172721.776209] [] 0xffffffffffffffff [2172721.781401] LustreError: dumping log to /tmp/lustre-log.1575005040.44587 [2172925.061021] LustreError: 44587:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1575004943, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9a6c8c9b2880/0xdeacf8e4b05a498c lrc: 3/0,1 mode: --/CW res: [0x2c0033ea3:0x9dc4:0x0].0x0 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 44587 timeout: 0 lvb_type: 0 [2173092.679428] LNet: Service thread pid 44587 completed after 571.70s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [2196417.183420] Lustre: fir-MDT0002: Connection restored to 88e820c8-82a8-e8a8-9d73-bb67d48d5644 (at 10.8.23.28@o2ib6) [2196436.104484] Lustre: fir-MDT0002: haven't heard from client 88e820c8-82a8-e8a8-9d73-bb67d48d5644 (at 10.8.23.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8ec7f72000, cur 1575028754 expire 1575028604 last 1575028527 [2198352.878085] LNetError: 43258:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [2198357.851147] Lustre: fir-MDT0002: Client a100ec04-4cce-848c-9946-1d205818cd80 (at 10.9.105.67@o2ib4) reconnecting [2198357.861524] Lustre: fir-MDT0002: Connection restored to a100ec04-4cce-848c-9946-1d205818cd80 (at 10.9.105.67@o2ib4) [2198359.438794] Lustre: fir-MDT0002: Client d7b23752-f1f0-8c4b-6c13-f8cb8f537c71 (at 10.9.109.62@o2ib4) reconnecting [2198359.449165] Lustre: fir-MDT0002: Connection restored to d7b23752-f1f0-8c4b-6c13-f8cb8f537c71 (at 10.9.109.62@o2ib4) [2198360.444219] Lustre: fir-MDT0002: Client 287bbc14-4740-0174-db3b-c222cf1fb93e (at 10.9.104.42@o2ib4) reconnecting [2198360.454572] Lustre: Skipped 2 previous similar messages [2198360.459987] Lustre: fir-MDT0002: Connection restored to 287bbc14-4740-0174-db3b-c222cf1fb93e (at 10.9.104.42@o2ib4) [2198360.470610] Lustre: Skipped 2 previous similar messages [2198361.464835] LNetError: 43259:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [2198362.445933] Lustre: fir-MDT0002: Client f0a8ec9b-fbf5-a8d2-cba4-506dafb70319 (at 10.9.110.5@o2ib4) reconnecting [2198362.456196] Lustre: Skipped 52 previous similar messages [2198362.461723] Lustre: fir-MDT0002: Connection restored to f0a8ec9b-fbf5-a8d2-cba4-506dafb70319 (at 10.9.110.5@o2ib4) [2198362.472273] Lustre: Skipped 52 previous similar messages [2198367.029762] Lustre: fir-MDT0002: Client 6bdde767-e980-edfb-a0f5-b03ac49e6985 (at 10.9.102.42@o2ib4) reconnecting [2198367.040130] Lustre: Skipped 59 previous similar messages [2198367.045645] Lustre: fir-MDT0002: Connection restored to 6bdde767-e980-edfb-a0f5-b03ac49e6985 (at 10.9.102.42@o2ib4) [2198367.056257] Lustre: Skipped 59 previous similar messages [2198368.811616] Lustre: 44596:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1575030679/real 0] req@ffff9a719532d100 x1651386267852576/t0(0) o104->fir-MDT0002@10.9.117.41@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1575030686 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [2198368.838350] Lustre: 44596:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 12 previous similar messages [2198436.904500] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.108.23@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2198436.922043] LustreError: Skipped 1 previous similar message [2198441.074321] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.110.7@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2198441.091780] LustreError: Skipped 54 previous similar messages [2198460.459133] Lustre: fir-MDT0002: Client fbb242ce-28df-aaee-d662-445a250a29ad (at 10.9.105.6@o2ib4) reconnecting [2198460.469401] Lustre: Skipped 26 previous similar messages [2198460.474923] Lustre: fir-MDT0002: Connection restored to fbb242ce-28df-aaee-d662-445a250a29ad (at 10.9.105.6@o2ib4) [2198460.485463] Lustre: Skipped 26 previous similar messages [2215025.590811] Lustre: fir-MDT0002: haven't heard from client 26ac399e-0ec5-9036-2064-cf367835d3cc (at 10.9.107.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b944ec00, cur 1575047343 expire 1575047193 last 1575047116 [2224774.113412] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2224774.123853] Lustre: Skipped 62 previous similar messages [2224781.844334] Lustre: fir-MDT0002: haven't heard from client 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a903ab79800, cur 1575057099 expire 1575056949 last 1575056872 [2225349.446006] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2225378.860361] Lustre: fir-MDT0002: haven't heard from client c19db27f-fe1b-a92b-e998-c198c3b9a300 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8168975800, cur 1575057696 expire 1575057546 last 1575057469 [2226165.322998] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2226167.880248] Lustre: fir-MDT0002: haven't heard from client a2a60d30-3451-a31c-7343-818d5f18f34a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7eb62a7800, cur 1575058485 expire 1575058335 last 1575058258 [2226868.031458] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2226869.897592] Lustre: fir-MDT0002: haven't heard from client 78429e66-1836-0fba-8a2a-6e5dcfb84f54 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a70069e4c00, cur 1575059187 expire 1575059037 last 1575058960 [2227532.590453] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2227572.915843] Lustre: fir-MDT0002: haven't heard from client fb742ad2-d0ff-7d99-f9c1-d7aac889be69 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f870c0c00, cur 1575059890 expire 1575059740 last 1575059663 [2228312.936176] Lustre: fir-MDT0002: haven't heard from client b8d6b16d-d8c6-89cb-6214-df060c6a77b6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a63afe18c00, cur 1575060630 expire 1575060480 last 1575060403 [2228323.155158] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2229570.517438] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2229604.970855] Lustre: fir-MDT0002: haven't heard from client a77552a2-f798-6859-3124-f0e4ef609271 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a66b51e1c00, cur 1575061922 expire 1575061772 last 1575061695 [2230049.707377] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2230087.982712] Lustre: fir-MDT0002: haven't heard from client b0b513f5-f566-1793-6b5f-f7ddc441f0bc (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5e3a0df400, cur 1575062405 expire 1575062255 last 1575062178 [2230914.402202] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2230918.003654] Lustre: fir-MDT0002: haven't heard from client 364ba98d-98b4-387f-a82b-9bd3bbf04a31 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6e33347c00, cur 1575063235 expire 1575063085 last 1575063008 [2232522.047226] Lustre: fir-MDT0002: haven't heard from client 79f0d186-2d9d-136d-e32a-c6b83be61613 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a55b803d400, cur 1575064839 expire 1575064689 last 1575064612 [2232539.937883] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2233091.688328] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2233119.062000] Lustre: fir-MDT0002: haven't heard from client 77954fb0-eb30-ff9c-f981-c7390fd32b5c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7baeeee400, cur 1575065436 expire 1575065286 last 1575065209 [2234149.875610] LNetError: 43255:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2247638.727915] LNetError: 43270:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [2247748.478897] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2247777.440097] Lustre: fir-MDT0002: haven't heard from client 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7062526c00, cur 1575080094 expire 1575079944 last 1575079867 [2251692.783393] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2251739.543358] Lustre: fir-MDT0002: haven't heard from client a96f7db7-987a-7c25-4c3c-9e482c0fe73b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8870f6a400, cur 1575084056 expire 1575083906 last 1575083829 [2253566.806003] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2253590.592987] Lustre: fir-MDT0002: haven't heard from client a324267a-8769-6f56-ec32-ac25bd2c1ee2 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6ca4398c00, cur 1575085907 expire 1575085757 last 1575085680 [2259432.169789] Lustre: fir-MDT0002: Connection restored to d3f33066-55c8-8f52-6260-5618395fc5ce (at 10.9.101.40@o2ib4) [2259483.747920] Lustre: fir-MDT0002: haven't heard from client d3f33066-55c8-8f52-6260-5618395fc5ce (at 10.9.101.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a85bf004000, cur 1575091800 expire 1575091650 last 1575091573 [2260824.638546] Lustre: 68805:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 3: -22 [2260825.137873] Lustre: 44418:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 3: -22 [2260825.149702] Lustre: 44418:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 996 previous similar messages [2260826.137969] Lustre: 43942:0:(mdd_device.c:1807:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 3: -22 [2260826.149796] Lustre: 43942:0:(mdd_device.c:1807:mdd_changelog_clear()) Skipped 1690 previous similar messages [2268454.783771] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2268493.982135] Lustre: fir-MDT0002: haven't heard from client dc8818ab-f8cb-f207-4605-acfeed8ad5e8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a76da259000, cur 1575100810 expire 1575100660 last 1575100583 [2268819.764847] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2268858.992517] Lustre: fir-MDT0002: haven't heard from client 13bfc90c-824e-bab0-5bba-055ac3285686 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6ca006fc00, cur 1575101175 expire 1575101025 last 1575100948 [2269632.201256] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2269662.013924] Lustre: fir-MDT0002: haven't heard from client 279d9365-a127-825e-c175-8ca4332da3af (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5e5dbdfc00, cur 1575101978 expire 1575101828 last 1575101751 [2270200.040301] Lustre: fir-MDT0002: haven't heard from client 6c89c650-0f73-0af8-035a-faf8ceb0d888 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8168cbec00, cur 1575102516 expire 1575102366 last 1575102289 [2270205.102270] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2270836.185084] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2270873.044718] Lustre: fir-MDT0002: haven't heard from client 088e6e8c-9808-8b5f-512f-d94ca1f439c7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a63724c6000, cur 1575103189 expire 1575103039 last 1575102962 [2271631.343047] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2271641.065127] Lustre: fir-MDT0002: haven't heard from client 8e55154e-0445-ce79-11e5-99f373d4dc4d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6379040000, cur 1575103957 expire 1575103807 last 1575103730 [2289195.520075] Lustre: fir-MDT0002: haven't heard from client c87dfc6f-f357-3fc6-276a-c37f83305e19 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6331473c00, cur 1575121511 expire 1575121361 last 1575121284 [2289213.769569] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2289679.872373] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2289731.533106] Lustre: fir-MDT0002: haven't heard from client 90971dd9-6a25-b1db-31d4-133090d6446b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a53918f7800, cur 1575122047 expire 1575121897 last 1575121820 [2290088.675036] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2290108.580875] Lustre: fir-MDT0002: haven't heard from client 8e72176d-2d3f-8d95-a4b6-b2eca9e6f92d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5d81f73800, cur 1575122424 expire 1575122274 last 1575122197 [2290515.667433] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2290517.553663] Lustre: fir-MDT0002: haven't heard from client 003fbeab-29b2-6cee-ecb7-481f3a3fe3e2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a79219a0800, cur 1575122833 expire 1575122683 last 1575122606 [2293011.619322] Lustre: fir-MDT0002: haven't heard from client 1e90d038-eb33-7c3d-5a1f-22732747cb26 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6cd0482000, cur 1575125327 expire 1575125177 last 1575125100 [2293014.839572] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2293868.737189] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2293895.643559] Lustre: fir-MDT0002: haven't heard from client 2d1ec3c7-8b3a-4d7c-7796-4fe4bd114711 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a773e6ddc00, cur 1575126211 expire 1575126061 last 1575125984 [2343193.902142] Lustre: fir-MDT0002: haven't heard from client 3b0a89b7-cc70-d975-2d52-92896d01d45c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d396ebc00, cur 1575175508 expire 1575175358 last 1575175281 [2360482.837213] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2360494.355265] Lustre: fir-MDT0002: haven't heard from client 94e8a702-3db2-e1cc-b3b1-f194f7b293bf (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6f09adc000, cur 1575192808 expire 1575192658 last 1575192581 [2360662.576422] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2360709.353876] Lustre: fir-MDT0002: haven't heard from client 8162fbe9-280f-08ab-1273-9a51b879beb5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a52c0703000, cur 1575193023 expire 1575192873 last 1575192796 [2360848.528046] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2360889.358651] Lustre: fir-MDT0002: haven't heard from client 81e58f8b-aa4b-d13a-8d04-23b3ecd61372 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7a85f80800, cur 1575193203 expire 1575193053 last 1575192976 [2367496.536605] Lustre: fir-MDT0002: haven't heard from client b887c805-6a80-adc0-4c15-9655e6026d27 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7fb0479000, cur 1575199810 expire 1575199660 last 1575199583 [2367519.602024] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2373453.689099] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2373480.705409] Lustre: fir-MDT0002: haven't heard from client 7c52d36e-33b0-3099-344d-5aeec5ccfae8 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a55b8298c00, cur 1575205794 expire 1575205644 last 1575205567 [2389134.363297] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2389175.099201] Lustre: fir-MDT0002: haven't heard from client ee02fc69-a3ec-4e08-9e6f-75b0a95411bd (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7167ed0800, cur 1575221488 expire 1575221338 last 1575221261 [2390680.130044] Lustre: fir-MDT0002: haven't heard from client bfa5427c-32ed-52cb-da1b-cdffa2c9571d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6d1310f000, cur 1575222993 expire 1575222843 last 1575222766 [2390692.163523] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2398416.456728] LNetError: 43271:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.209@o2ib7 added to recovery queue. Health = 900 [2398416.456739] LustreError: 68874:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(4096) req@ffff9a79a3402400 x1649053344272192/t0(0) o37->cec884d3-ca4b-8127-2f6b-7762665aa5f8@10.9.0.64@o2ib4:259/0 lens 448/440 e 1 to 0 dl 1575230749 ref 1 fl Interpret:/0/0 rc 0/0 [2398467.022058] Lustre: fir-MDT0002: Client cec884d3-ca4b-8127-2f6b-7762665aa5f8 (at 10.9.0.64@o2ib4) reconnecting [2398467.032277] Lustre: Skipped 62 previous similar messages [2398467.037819] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.64@o2ib4) [2398492.441346] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2398492.458716] LustreError: Skipped 5 previous similar messages [2398576.941222] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2398591.007196] Lustre: fir-MDT0002: Client cec884d3-ca4b-8127-2f6b-7762665aa5f8 (at 10.9.0.64@o2ib4) reconnecting [2398591.017397] Lustre: fir-MDT0002: Connection restored to (at 10.9.0.64@o2ib4) [2398663.709617] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2398688.798476] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.0.64@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [2424665.020553] Lustre: fir-MDT0002: haven't heard from client e80b82d2-d23f-808a-e866-7a0081a62a2c (at 10.9.114.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a87a065d000, cur 1575256977 expire 1575256827 last 1575256750 [2434618.797547] Lustre: 68806:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575266923/real 1575266923] req@ffff9a5afb748d80 x1651386597779664/t0(0) o104->fir-MDT0002@10.9.112.10@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1575266930 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [2434625.824718] Lustre: 68806:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575266930/real 1575266930] req@ffff9a5afb748d80 x1651386597779664/t0(0) o104->fir-MDT0002@10.9.112.10@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1575266937 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2434625.852231] Lustre: 68806:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [2434639.862069] Lustre: 68806:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575266944/real 1575266944] req@ffff9a5afb748d80 x1651386597779664/t0(0) o104->fir-MDT0002@10.9.112.10@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1575266951 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2434639.889586] Lustre: 68806:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [2434660.899598] Lustre: 68806:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575266965/real 1575266965] req@ffff9a5afb748d80 x1651386597779664/t0(0) o104->fir-MDT0002@10.9.112.10@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1575266972 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2434660.927114] Lustre: 68806:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 8 previous similar messages [2434695.937466] Lustre: 68806:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575267000/real 1575267000] req@ffff9a5afb748d80 x1651386597779664/t0(0) o104->fir-MDT0002@10.9.112.10@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1575267007 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2434695.964978] Lustre: 68806:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14 previous similar messages [2434716.975023] LustreError: 68806:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.9.112.10@o2ib4) failed to reply to blocking AST (req@ffff9a5afb748d80 x1651386597779664 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a52e3d4c380/0xdeacf8e9a9e41e87 lrc: 4/0,0 mode: PR/PR res: [0x2c002ea82:0x669:0x0].0x0 bits 0x13/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.112.10@o2ib4 remote: 0xfe867d59102b3a7f expref: 528 pid: 44566 timeout: 2434746 lvb_type: 0 [2434717.018229] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.112.10@o2ib4 was evicted due to a lock blocking callback time out: rc -110 [2434717.031027] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.9.112.10@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9a52e3d4c380/0xdeacf8e9a9e41e87 lrc: 3/0,0 mode: PR/PR res: [0x2c002ea82:0x669:0x0].0x0 bits 0x13/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.9.112.10@o2ib4 remote: 0xfe867d59102b3a7f expref: 529 pid: 44566 timeout: 0 lvb_type: 0 [2434764.247202] Lustre: fir-MDT0002: haven't heard from client 18ac70aa-c43d-b408-39e6-223aef24789b (at 10.9.113.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a70d1ff8c00, cur 1575267076 expire 1575266926 last 1575266849 [2434776.192954] Lustre: fir-MDT0002: Connection restored to 846d2a0d-e04f-8b8f-c45f-232742f397ba (at 10.9.114.8@o2ib4) [2434783.006073] Lustre: fir-MDT0002: Connection restored to a7bc434a-8373-f8b8-14a4-c02f18edba81 (at 10.9.112.13@o2ib4) [2434840.249159] Lustre: fir-MDT0002: haven't heard from client f8d40671-c236-16c2-661a-a56bf75ff770 (at 10.8.27.21@o2ib6) in 169 seconds. I think it's dead, and I am evicting it. exp ffff9a6779626000, cur 1575267152 expire 1575267002 last 1575266983 [2434840.271038] Lustre: Skipped 10 previous similar messages [2434931.591523] LustreError: 75524:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.8.27.21@o2ib6 arrived at 1575267243 with bad export cookie 16045473213564131744 [2434931.607250] LustreError: 75524:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 22481 previous similar messages [2434931.618105] Lustre: fir-MDT0002: Connection restored to f8d40671-c236-16c2-661a-a56bf75ff770 (at 10.8.27.21@o2ib6) [2435158.260707] Lustre: fir-MDT0002: haven't heard from client f8d40671-c236-16c2-661a-a56bf75ff770 (at 10.8.27.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a664ef37c00, cur 1575267470 expire 1575267320 last 1575267243 [2474413.253642] Lustre: fir-MDT0002: haven't heard from client 2de74503-c521-66c7-1aa4-3974c8cc0e5f (at 10.9.114.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b22bc400, cur 1575306724 expire 1575306574 last 1575306497 [2475439.510995] Lustre: fir-MDT0002: Connection restored to d42d660f-d79d-989c-bf6c-f667c28085e7 (at 10.9.115.8@o2ib4) [2475444.980776] Lustre: fir-MDT0002: Connection restored to 8842a09b-a155-b29c-efe9-6e47b76a8b23 (at 10.9.112.17@o2ib4) [2475473.469142] Lustre: fir-MDT0002: Connection restored to 80a8792f-a989-a169-9080-96907468b701 (at 10.9.113.10@o2ib4) [2475483.796388] Lustre: fir-MDT0002: Connection restored to a48cfbbe-65f9-a8de-a959-4163f04b9d42 (at 10.9.115.4@o2ib4) [2475483.806918] Lustre: Skipped 1 previous similar message [2475496.925172] Lustre: fir-MDT0002: Connection restored to d0ee71bb-1bb2-f91e-545f-0c28c14b006b (at 10.9.114.13@o2ib4) [2475512.746002] Lustre: fir-MDT0002: Connection restored to 73571755-d09b-c045-22ef-0bc178d2b7f8 (at 10.9.112.10@o2ib4) [2475549.530208] Lustre: fir-MDT0002: Connection restored to 04000acf-43e7-95df-0031-b79edd6129c1 (at 10.9.114.10@o2ib4) [2475640.257228] Lustre: fir-MDT0002: Connection restored to 2de74503-c521-66c7-1aa4-3974c8cc0e5f (at 10.9.114.12@o2ib4) [2475881.198511] Lustre: fir-MDT0002: Connection restored to bfffc06f-f47f-1054-b65c-dc5c6c0b83de (at 10.8.9.2@o2ib6) [2476055.575358] Lustre: fir-MDT0002: Connection restored to 9ffa28bc-5939-f8d4-9315-34f5c7c877e9 (at 10.9.103.72@o2ib4) [2476313.933018] Lustre: fir-MDT0002: Connection restored to (at 10.9.103.31@o2ib4) [2476313.940510] Lustre: Skipped 2 previous similar messages [2479920.395284] Lustre: fir-MDT0002: haven't heard from client 25a4c1e1-bbca-585a-45fd-44ab3b7da8c4 (at 10.9.108.46@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bedf3000, cur 1575312231 expire 1575312081 last 1575312004 [2480638.419141] Lustre: fir-MDT0002: haven't heard from client 3b3caf4b-f935-136a-31c3-4f8236e9a587 (at 10.9.105.55@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b22be800, cur 1575312949 expire 1575312799 last 1575312722 [2480638.441110] Lustre: Skipped 17 previous similar messages [2481043.016400] Lustre: fir-MDT0002: Connection restored to (at 10.9.113.4@o2ib4) [2481043.023803] Lustre: Skipped 6 previous similar messages [2481541.993409] Lustre: fir-MDT0002: Connection restored to d9a680cf-f48f-730f-ae55-619c940ab227 (at 10.9.110.46@o2ib4) [2481542.004019] Lustre: Skipped 3 previous similar messages [2481771.811738] Lustre: fir-MDT0002: Connection restored to 74802258-a1d9-525d-38fd-e9a5d8c30bac (at 10.9.108.52@o2ib4) [2481771.822351] Lustre: Skipped 3 previous similar messages [2482165.507251] Lustre: fir-MDT0002: Connection restored to 2395d511-16d0-1290-9e8f-9516bca9ed5b (at 10.9.101.5@o2ib4) [2482165.517775] Lustre: Skipped 1 previous similar message [2482952.339342] Lustre: fir-MDT0002: Connection restored to 3b3caf4b-f935-136a-31c3-4f8236e9a587 (at 10.9.105.55@o2ib4) [2482952.349948] Lustre: Skipped 7 previous similar messages [2486256.557718] Lustre: fir-MDT0002: haven't heard from client 00449cc2-ca63-775d-8e81-cc58554d4ea0 (at 10.9.102.17@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6c053afc00, cur 1575318567 expire 1575318417 last 1575318340 [2517193.527233] Lustre: fir-MDT0002: Connection restored to bfffc06f-f47f-1054-b65c-dc5c6c0b83de (at 10.8.9.2@o2ib6) [2558661.432503] Lustre: fir-MDT0002: haven't heard from client d1542b28-5e7a-e84d-4f8a-41f687fce618 (at 10.9.112.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8e82703000, cur 1575390970 expire 1575390820 last 1575390743 [2560615.209056] Lustre: fir-MDT0002: Connection restored to d1542b28-5e7a-e84d-4f8a-41f687fce618 (at 10.9.112.12@o2ib4) [2567189.449952] Lustre: fir-MDT0002: Connection restored to f8d40671-c236-16c2-661a-a56bf75ff770 (at 10.8.27.21@o2ib6) [2582490.045611] Lustre: fir-MDT0002: haven't heard from client d26073b0-219e-7b9c-1089-8ca32a59a1ec (at 10.9.114.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6779627800, cur 1575414798 expire 1575414648 last 1575414571 [2584164.466760] Lustre: fir-MDT0002: Connection restored to d26073b0-219e-7b9c-1089-8ca32a59a1ec (at 10.9.114.4@o2ib4) [2584426.528332] Lustre: fir-MDT0002: Connection restored to 7bfc4bab-9b9f-aeec-6b2e-a891c2cd963d (at 10.8.25.17@o2ib6) [2614949.042182] LNetError: 43255:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [2614951.294997] Lustre: fir-MDT0002: Client a3141757-3cde-b831-3c68-2858596623cc (at 10.8.28.1@o2ib6) reconnecting [2614951.305201] Lustre: fir-MDT0002: Connection restored to a3141757-3cde-b831-3c68-2858596623cc (at 10.8.28.1@o2ib6) [2614952.122557] Lustre: fir-MDT0002: Client 72b66a84-eb6d-8862-b24a-97d6ffec93b7 (at 10.8.24.22@o2ib6) reconnecting [2614952.132836] Lustre: fir-MDT0002: Connection restored to 72b66a84-eb6d-8862-b24a-97d6ffec93b7 (at 10.8.24.22@o2ib6) [2614952.143358] Lustre: Skipped 1 previous similar message [2614953.265580] Lustre: fir-MDT0002: Client 5f7389bd-7341-c3d9-6958-0dafa5727862 (at 10.8.21.19@o2ib6) reconnecting [2614953.275853] Lustre: Skipped 9 previous similar messages [2614953.281282] Lustre: fir-MDT0002: Connection restored to 5f7389bd-7341-c3d9-6958-0dafa5727862 (at 10.8.21.19@o2ib6) [2614953.291821] Lustre: Skipped 8 previous similar messages [2614955.272852] Lustre: fir-MDT0002: Client fb63a42c-93f0-576d-f57c-a83fc4375277 (at 10.8.21.2@o2ib6) reconnecting [2614955.283030] Lustre: Skipped 13 previous similar messages [2614955.288549] Lustre: fir-MDT0002: Connection restored to fb63a42c-93f0-576d-f57c-a83fc4375277 (at 10.8.21.2@o2ib6) [2614955.299018] Lustre: Skipped 13 previous similar messages [2614959.284156] Lustre: fir-MDT0002: Client 268c459d-35da-6a96-ea14-02bf68273573 (at 10.8.19.7@o2ib6) reconnecting [2614959.294337] Lustre: Skipped 98 previous similar messages [2614959.299866] Lustre: fir-MDT0002: Connection restored to 268c459d-35da-6a96-ea14-02bf68273573 (at 10.8.19.7@o2ib6) [2614959.310327] Lustre: Skipped 98 previous similar messages [2614967.459035] Lustre: fir-MDT0002: Client 9e6019b2-e72a-be9a-07e3-b4bb84e4d17c (at 10.8.30.29@o2ib6) reconnecting [2614967.469299] Lustre: Skipped 179 previous similar messages [2614967.474904] Lustre: fir-MDT0002: Connection restored to 9e6019b2-e72a-be9a-07e3-b4bb84e4d17c (at 10.8.30.29@o2ib6) [2614967.485436] Lustre: Skipped 179 previous similar messages [2614969.368767] LNetError: 43271:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.201@o2ib7 added to recovery queue. Health = 900 [2614974.345894] LNetError: 43255:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [2614976.479381] LNetError: 43255:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [2614976.492071] LNetError: 43255:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 3 previous similar messages [2614991.750161] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.0.67@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2615041.863449] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.0.67@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [2615049.402830] Lustre: fir-MDT0002: Client a3141757-3cde-b831-3c68-2858596623cc (at 10.8.28.1@o2ib6) reconnecting [2615049.413006] Lustre: Skipped 111 previous similar messages [2615049.418618] Lustre: fir-MDT0002: Connection restored to a3141757-3cde-b831-3c68-2858596623cc (at 10.8.28.1@o2ib6) [2615049.429084] Lustre: Skipped 111 previous similar messages [2615087.421325] Lustre: fir-MDT0002: Client c45485d0-a195-e8f9-2514-9c638ae72851 (at 10.8.25.9@o2ib6) reconnecting [2615087.431500] Lustre: Skipped 361 previous similar messages [2615087.437108] Lustre: fir-MDT0002: Connection restored to c45485d0-a195-e8f9-2514-9c638ae72851 (at 10.8.25.9@o2ib6) [2615087.447560] Lustre: Skipped 361 previous similar messages [2625308.255673] Lustre: fir-MDT0002: Connection restored to 7bfc4bab-9b9f-aeec-6b2e-a891c2cd963d (at 10.8.25.17@o2ib6) [2625332.154785] Lustre: fir-MDT0002: haven't heard from client 7bfc4bab-9b9f-aeec-6b2e-a891c2cd963d (at 10.8.25.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a811260d000, cur 1575457639 expire 1575457489 last 1575457412 [2630588.292748] Lustre: fir-MDT0002: haven't heard from client c31f9345-fe41-ace3-e06f-9d1afab3d9fa (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5fe1988000, cur 1575462895 expire 1575462745 last 1575462668 [2630719.278512] Lustre: fir-MDT0002: Connection restored to 84c0915e-d422-31a4-8bd0-7d29dad31210 (at 10.9.106.54@o2ib4) [2647607.749770] Lustre: fir-MDT0002: haven't heard from client 9345a428-f4b4-bf4e-fc36-dcf56b6b1a06 (at 10.8.24.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91bedf4c00, cur 1575479914 expire 1575479764 last 1575479687 [2649688.105646] Lustre: fir-MDT0002: Connection restored to 9345a428-f4b4-bf4e-fc36-dcf56b6b1a06 (at 10.8.24.23@o2ib6) [2737767.553376] Lustre: fir-MDT0002: Connection restored to c21e9ff8-f23d-8841-997b-5e239de24468 (at 10.9.101.60@o2ib4) [2744812.141720] Lustre: fir-MDT0002: Connection restored to 3d610722-815c-4925-f168-a753a2fd48f2 (at 10.9.0.1@o2ib4) [2745003.011027] Lustre: fir-MDT0002: Connection restored to 8b327a74-be7d-fd75-0b06-01a3a60b4f4d (at 10.9.0.2@o2ib4) [2745764.335742] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2745803.268924] Lustre: fir-MDT0002: haven't heard from client 4db84553-80b0-2ed1-825f-b8f5ed973677 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a60a011f800, cur 1575578107 expire 1575577957 last 1575577880 [2746119.077455] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2746156.290182] Lustre: fir-MDT0002: haven't heard from client 59168559-752b-cc0f-9826-f6d31f85396b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a742fd3a800, cur 1575578460 expire 1575578310 last 1575578233 [2749767.783420] Lustre: fir-MDT0002: Connection restored to c21e9ff8-f23d-8841-997b-5e239de24468 (at 10.9.101.60@o2ib4) [2750186.513931] Lustre: fir-MDT0002: Connection restored to 72fdf7d1-5425-a15e-9e4c-585c92453d78 (at 10.8.0.5@o2ib6) [2750346.742403] Lustre: fir-MDT0002: Connection restored to fab33075-4473-4 (at 10.8.0.3@o2ib6) [2750350.014734] Lustre: fir-MDT0002: Connection restored to b0966a55-8d50-f88f-2446-bbae7bbc4faf (at 10.9.0.4@o2ib4) [2751219.408056] Lustre: fir-MDT0002: haven't heard from client b660f66d-32a2-3106-63ec-5b1be77c1599 (at 10.9.104.26@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a66fce53000, cur 1575583523 expire 1575583373 last 1575583296 [2752165.432017] Lustre: fir-MDT0002: haven't heard from client f92b2e0e-78d1-713e-9a0d-2f3b9a2f05eb (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8e2ea80c00, cur 1575584469 expire 1575584319 last 1575584242 [2752165.453977] Lustre: Skipped 7 previous similar messages [2752223.360976] Lustre: fir-MDT0002: Connection restored to f92b2e0e-78d1-713e-9a0d-2f3b9a2f05eb (at 10.9.109.37@o2ib4) [2752298.047153] Lustre: fir-MDT0002: Connection restored to 6219fb63-5fc8-e6f2-51e7-c1495be4c1e4 (at 10.9.115.5@o2ib4) [2752717.321279] Lustre: fir-MDT0002: Connection restored to fc5a3d4d-111a-c396-b61e-9a2380068329 (at 10.8.9.1@o2ib6) [2752777.105873] Lustre: fir-MDT0002: Connection restored to 26ac399e-0ec5-9036-2064-cf367835d3cc (at 10.9.107.24@o2ib4) [2752785.611616] Lustre: fir-MDT0002: Connection restored to f21c0aa1-d268-a7ff-fa0a-39cd24e3bb04 (at 10.9.107.20@o2ib4) [2752802.581962] Lustre: fir-MDT0002: Connection restored to 7bfc4bab-9b9f-aeec-6b2e-a891c2cd963d (at 10.8.25.17@o2ib6) [2753015.743091] Lustre: fir-MDT0002: Connection restored to 1cbb34e0-2311-75c5-c117-d27a4b4dd716 (at 10.9.101.1@o2ib4) [2753115.653270] Lustre: fir-MDT0002: Connection restored to c7462147-979f-1da0-198a-d9891daa225a (at 10.9.107.51@o2ib4) [2753146.456929] Lustre: fir-MDT0002: haven't heard from client 33fedfb4-89cb-b584-6f2b-dc7919b6b0e7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a59b3d41400, cur 1575585450 expire 1575585300 last 1575585223 [2753248.591622] Lustre: fir-MDT0002: Connection restored to 793bfa9b-0133-cab6-bd0d-2f6b5f3d32dc (at 10.8.26.4@o2ib6) [2753248.602065] Lustre: Skipped 1 previous similar message [2753399.300235] Lustre: fir-MDT0002: Connection restored to 65e0838c-0314-69a9-425b-dacabdad6958 (at 10.8.22.3@o2ib6) [2753399.310673] Lustre: Skipped 3 previous similar messages [2753661.345648] Lustre: fir-MDT0002: Connection restored to 6d806dc4-d503-b778-6e76-2b85109faf26 (at 10.9.104.25@o2ib4) [2753661.356256] Lustre: Skipped 5 previous similar messages [2754610.493332] Lustre: fir-MDT0002: haven't heard from client 36998ce5-20fb-4da2-cfb7-6f251cbab841 (at 10.9.101.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a66fce57000, cur 1575586914 expire 1575586764 last 1575586687 [2755133.506351] Lustre: fir-MDT0002: haven't heard from client fcc1cf4a-a103-6faa-4cd9-4b4bc27b0479 (at 10.9.101.58@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6779623800, cur 1575587437 expire 1575587287 last 1575587210 [2756272.639165] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2756272.649690] Lustre: Skipped 1 previous similar message [2756316.536564] Lustre: fir-MDT0002: haven't heard from client cc801970-d305-fff7-4041-0bb98053647f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a597a02f800, cur 1575588620 expire 1575588470 last 1575588393 [2756894.012726] Lustre: fir-MDT0002: Connection restored to 36998ce5-20fb-4da2-cfb7-6f251cbab841 (at 10.9.101.57@o2ib4) [2757413.977661] Lustre: fir-MDT0002: Connection restored to fcc1cf4a-a103-6faa-4cd9-4b4bc27b0479 (at 10.9.101.58@o2ib4) [2758507.586860] LNetError: 43254:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [2758507.597208] LNetError: 43254:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 1 previous similar message [2758507.607470] LNetError: 43254:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (56): c: 0, oc: 0, rc: 8 [2758507.619548] LNetError: 43254:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 1 previous similar message [2758507.629623] LNetError: 43257:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.3@o2ib7 added to recovery queue. Health = 900 [2758507.642579] LNetError: 43257:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 3 previous similar messages [2758507.653601] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758508.444881] Lustre: 44418:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1575590804/real 0] req@ffff9a579dba4c80 x1651387273307504/t0(0) o104->fir-MDT0002@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1575590811 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [2758508.471450] Lustre: 44418:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [2758508.827241] LNetError: 81600:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758508.839330] LNetError: 81600:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 1 previous similar message [2758510.833239] LNetError: 80882:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758513.833216] LNetError: 80882:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758513.845303] LNetError: 80882:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 2 previous similar messages [2758517.838518] LNetError: 80882:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758517.850609] LNetError: 80882:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 2 previous similar messages [2758518.838505] Lustre: 73639:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1575590819/real 1575590822] req@ffff9a9113298d80 x1651387273533872/t0(0) o104->fir-MDT0002@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1575590826 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [2758518.866193] Lustre: 73639:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 22 previous similar messages [2758525.853668] LNetError: 81600:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758525.865759] LNetError: 81600:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 5 previous similar messages [2758536.865782] Lustre: 73639:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1575590837/real 1575590840] req@ffff9a9113298d80 x1651387273533872/t0(0) o104->fir-MDT0002@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1575590844 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [2758536.893466] Lustre: 73639:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 35 previous similar messages [2758541.873103] LNetError: 81600:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758541.885192] LNetError: 81600:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 12 previous similar messages [2758569.902730] Lustre: 44414:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1575590870/real 1575590873] req@ffff9a572ecc0900 x1651387273308672/t0(0) o104->fir-MDT0002@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1575590877 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [2758569.930438] Lustre: 44414:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 64 previous similar messages [2758573.909771] LNetError: 80882:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758573.921856] LNetError: 80882:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 25 previous similar messages [2758602.949592] LustreError: 44404:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff9a90e33a7980 x1651387273308736 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6c8130b600/0xdeacf8edc2fe6aac lrc: 4/0,0 mode: PR/PR res: [0x2c0033787:0x31e1:0x0].0x0 bits 0x12/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad886cfe4 expref: 885244 pid: 44310 timeout: 2758628 lvb_type: 0 [2758602.949612] LustreError: 138-a: fir-MDT0002: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 [2758602.949615] LustreError: Skipped 2 previous similar messages [2758602.949638] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 102s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0002_UUID lock: ffff9a563e864a40/0xdeacf8edc1498291 lrc: 3/0,0 mode: PR/PR res: [0x2c0033d93:0x1e522:0x0].0x0 bits 0x1b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad85ce1aa expref: 885245 pid: 44286 timeout: 0 lvb_type: 0 [2758602.949641] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages [2758603.058779] LustreError: 44404:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 5 previous similar messages [2758615.589503] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2758617.589547] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2758617.599714] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 2 previous similar messages [2758622.589671] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 1 seconds [2758622.599841] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 5 previous similar messages [2758625.589749] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 2 seconds [2758625.599920] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 12 previous similar messages [2758629.590847] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2758629.601022] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 1 previous similar message [2758636.610022] Lustre: 44655:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575590933/real 1575590933] req@ffff9a6dd2fac800 x1651387277939312/t0(0) o104->fir-MDT0002@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1575590940 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 [2758636.637361] Lustre: 44655:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 113 previous similar messages [2758637.591044] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2758637.601218] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 15 previous similar messages [2758639.590102] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758639.602196] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 43 previous similar messages [2758653.591464] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 5 seconds [2758653.601639] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 34 previous similar messages [2758686.591286] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 2 seconds [2758686.601458] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 89 previous similar messages [2758706.592330] LustreError: 44655:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff9a6dd2fac800 x1651387277939312 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a6eab498480/0xdeacf8edc14da582 lrc: 4/0,0 mode: PR/PR res: [0x2c0032302:0xb9dc:0x0].0x0 bits 0x12/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad85e8e5c expref: 458934 pid: 44286 timeout: 2758725 lvb_type: 0 [2758706.635451] LustreError: 138-a: fir-MDT0002: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 [2758706.648054] LustreError: Skipped 3 previous similar messages [2758706.653912] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 102s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0002_UUID lock: ffff9a6eab498480/0xdeacf8edc14da582 lrc: 3/0,0 mode: PR/PR res: [0x2c0032302:0xb9dc:0x0].0x0 bits 0x12/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad85e8e5c expref: 458808 pid: 44286 timeout: 0 lvb_type: 0 [2758706.691565] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message [2758710.591931] LustreError: 43492:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff9a71b8dab600 x1651387278147424 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a5930d55c40/0xdeacf8edc2f13c0b lrc: 4/0,0 mode: PR/PR res: [0x2c0034a21:0x91e8:0x0].0x0 bits 0x1b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad87ceca0 expref: 450731 pid: 68805 timeout: 2758722 lvb_type: 0 [2758710.635045] LustreError: 43492:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message [2758710.645221] LustreError: 138-a: fir-MDT0002: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 [2758710.657819] LustreError: Skipped 1 previous similar message [2758710.663591] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0002_UUID lock: ffff9a5930d55c40/0xdeacf8edc2f13c0b lrc: 3/0,0 mode: PR/PR res: [0x2c0034a21:0x91e8:0x0].0x0 bits 0x1b/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad87ceca0 expref: 450588 pid: 68805 timeout: 0 lvb_type: 0 [2758710.701241] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message [2758713.591998] LustreError: 44462:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff9a8e81ec8480 x1651387278147344 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a5c58265580/0xdeacf8edad632e08 lrc: 4/0,0 mode: PR/PR res: [0x2c0033620:0x15840:0x0].0x0 bits 0x12/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad7e50569 expref: 444668 pid: 44415 timeout: 2758723 lvb_type: 0 [2758713.635203] LustreError: 138-a: fir-MDT0002: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 [2758713.647833] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 108s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0002_UUID lock: ffff9a5c58265580/0xdeacf8edad632e08 lrc: 3/0,0 mode: PR/PR res: [0x2c0033620:0x15840:0x0].0x0 bits 0x12/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad7e50569 expref: 444557 pid: 44415 timeout: 0 lvb_type: 0 [2758725.592290] LustreError: 44462:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff9a8e81ece780 x1651387278147360 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a5862a5f500/0xdeacf8edad632e63 lrc: 4/0,0 mode: PR/PR res: [0x2c0033620:0x15840:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad7e50570 expref: 421305 pid: 44268 timeout: 2758733 lvb_type: 0 [2758725.635514] LustreError: 44462:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message [2758725.645714] LustreError: 138-a: fir-MDT0002: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 [2758725.658334] LustreError: Skipped 1 previous similar message [2758725.664123] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 120s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0002_UUID lock: ffff9a5862a5f500/0xdeacf8edad632e63 lrc: 3/0,0 mode: PR/PR res: [0x2c0033620:0x15840:0x0].0x0 bits 0x1b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad7e50570 expref: 421171 pid: 44268 timeout: 0 lvb_type: 0 [2758725.701870] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message [2758753.592958] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2758753.603133] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 99 previous similar messages [2758761.593182] LustreError: 44408:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff9a59ab029b00 x1651387279864752 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a5a6a5e9b00/0xdeacf8edc2ed6373 lrc: 4/0,0 mode: PR/PR res: [0x2c0032284:0x125d3:0x0].0x0 bits 0x1b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad87b4ec7 expref: 357237 pid: 44390 timeout: 2758763 lvb_type: 0 [2758761.636412] LustreError: 138-a: fir-MDT0002: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 [2758761.649032] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 114s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0002_UUID lock: ffff9a5a6a5e9b00/0xdeacf8edc2ed6373 lrc: 3/0,0 mode: PR/PR res: [0x2c0032284:0x125d3:0x0].0x0 bits 0x1b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad87b4ec7 expref: 357144 pid: 44390 timeout: 0 lvb_type: 0 [2758768.593344] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2758768.605428] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 80 previous similar messages [2758774.593458] Lustre: 68827:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575591070/real 1575591070] req@ffff9a54ef6f9b00 x1651387282810784/t0(0) o104->fir-MDT0002@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1575591077 ref 2 fl Rpc:X/2/ffffffff rc 0/-1 [2758774.620801] Lustre: 68827:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 58 previous similar messages [2758805.734221] LNet: Service thread pid 44655 was inactive for 200.77s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [2758805.751334] Pid: 44655, comm: mdt01_090 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [2758805.761684] Call Trace: [2758805.764322] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [2758805.771452] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [2758805.778827] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [2758805.785830] [] mdt_object_lock_internal+0x70/0x360 [mdt] [2758805.793008] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [2758805.799854] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [2758805.806859] [] mdt_reint_setattr+0x667/0x1290 [mdt] [2758805.813618] [] mdt_reint_rec+0x83/0x210 [mdt] [2758805.819849] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [2758805.826591] [] mdt_reint+0x67/0x140 [mdt] [2758805.832469] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [2758805.839621] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [2758805.847531] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [2758805.854045] [] kthread+0xd1/0xe0 [2758805.859135] [] ret_from_fork_nospec_begin+0xe/0x21 [2758805.865785] [] 0xffffffffffffffff [2758805.870989] LustreError: dumping log to /tmp/lustre-log.1575591109.44655 [2758822.594706] LustreError: 68827:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff9a54ef6f9b00 x1651387282810784 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a5397c3bf00/0xdeacf8edc2fe6a12 lrc: 4/0,0 mode: PR/PR res: [0x2c0033787:0x31e3:0x0].0x0 bits 0x12/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad886cf43 expref: 263518 pid: 44310 timeout: 2758840 lvb_type: 0 [2758822.637824] LustreError: 68827:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message [2758822.648007] LustreError: 138-a: fir-MDT0002: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 [2758822.660600] LustreError: Skipped 1 previous similar message [2758822.666377] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 101s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0002_UUID lock: ffff9a5397c3bf00/0xdeacf8edc2fe6a12 lrc: 3/0,0 mode: PR/PR res: [0x2c0033787:0x31e3:0x0].0x0 bits 0x12/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0x5ba8a2fad886cf43 expref: 263417 pid: 44310 timeout: 0 lvb_type: 0 [2758822.704036] LustreError: 43485:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message [2758848.743285] LNet: Service thread pid 44408 was inactive for 200.40s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [2758848.760399] Pid: 44408, comm: mdt02_042 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [2758848.770745] Call Trace: [2758848.773386] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [2758848.780518] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [2758848.787890] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [2758848.794893] [] mdt_object_lock_internal+0x70/0x360 [mdt] [2758848.802068] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [2758848.808889] [] mdt_reint_link+0x7dc/0xc20 [mdt] [2758848.815286] [] mdt_reint_rec+0x83/0x210 [mdt] [2758848.821511] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [2758848.828263] [] mdt_reint+0x67/0x140 [mdt] [2758848.834138] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [2758848.841270] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [2758848.849168] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [2758848.855681] [] kthread+0xd1/0xe0 [2758848.860771] [] ret_from_fork_nospec_begin+0xe/0x21 [2758848.867421] [] 0xffffffffffffffff [2758848.872629] LustreError: dumping log to /tmp/lustre-log.1575591152.44408 [2758883.596183] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2758883.606353] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 326 previous similar messages [2758886.727154] LNet: Service thread pid 44655 completed after 281.76s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [2758921.961110] LNet: Service thread pid 68827 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [2758921.978220] Pid: 68827, comm: mdt02_102 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 [2758921.988567] Call Trace: [2758921.991248] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [2758921.998383] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [2758922.005756] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [2758922.012766] [] mdt_object_lock_internal+0x70/0x360 [mdt] [2758922.019952] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [2758922.026791] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [2758922.033805] [] mdt_reint_setattr+0x667/0x1290 [mdt] [2758922.040558] [] mdt_reint_rec+0x83/0x210 [mdt] [2758922.046799] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [2758922.053576] [] mdt_reint+0x67/0x140 [mdt] [2758922.059467] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [2758922.066628] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [2758922.074532] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [2758922.081052] [] kthread+0xd1/0xe0 [2758922.086153] [] ret_from_fork_nospec_begin+0xe/0x21 [2758922.092810] [] 0xffffffffffffffff [2758922.098017] LustreError: dumping log to /tmp/lustre-log.1575591225.68827 [2758922.110463] LNet: Service thread pid 44408 completed after 273.76s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [2758974.758134] LNet: Service thread pid 68827 completed after 253.08s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [2759026.599775] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2759026.611867] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 162 previous similar messages [2759146.602774] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 1 seconds [2759146.612944] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 147 previous similar messages [2759272.019077] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2759310.613233] Lustre: fir-MDT0002: haven't heard from client 16aeb29c-e083-9f5f-14c9-c43ed9d9f884 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a640d969400, cur 1575591614 expire 1575591464 last 1575591387 [2759545.612920] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2759545.625018] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 51 previous similar messages [2759666.616012] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 1 seconds [2759666.626180] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 51 previous similar messages [2760151.628456] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2760151.640544] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 57 previous similar messages [2760270.631538] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2760270.641711] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 56 previous similar messages [2760755.644046] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2760755.656132] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 54 previous similar messages [2760885.647399] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2760885.657568] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 55 previous similar messages [2760993.032979] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2761018.658542] Lustre: fir-MDT0002: haven't heard from client a85afb55-3f5d-bf62-95e1-772365c773fa (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a63255ce400, cur 1575593322 expire 1575593172 last 1575593095 [2761321.664719] Lustre: fir-MDT0002: haven't heard from client 3dc0f271-8eca-7211-fb9e-47a6de7181fc (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a52a93bbc00, cur 1575593625 expire 1575593475 last 1575593398 [2761370.659810] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2761370.671899] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 53 previous similar messages [2761434.781558] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2761490.662896] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2761490.673066] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 51 previous similar messages [2761661.672597] Lustre: fir-MDT0002: haven't heard from client fbce2819-01ba-5492-d95f-92b4557368e5 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a56b847c000, cur 1575593965 expire 1575593815 last 1575593738 [2761671.496830] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2761892.250867] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2761948.682600] Lustre: fir-MDT0002: haven't heard from client 94dc1465-9545-56c3-befd-6510c22f452c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a76bbe05c00, cur 1575594252 expire 1575594102 last 1575594025 [2761975.675338] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2761975.687422] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 46 previous similar messages [2762100.678508] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2762100.688683] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 38 previous similar messages [2762590.691168] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2762590.703252] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 45 previous similar messages [2762705.694119] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2762705.704290] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 45 previous similar messages [2763201.706886] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2763201.718968] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 45 previous similar messages [2763316.709804] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 1 seconds [2763316.719974] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 45 previous similar messages [2763416.289356] Lustre: 68826:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575595708/real 1575595708] req@ffff9a5abd151200 x1651387311050096/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575595719 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [2763416.316782] Lustre: 68826:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 102 previous similar messages [2763449.328189] Lustre: 68826:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1575595741/real 1575595741] req@ffff9a5abd151200 x1651387311050096/t0(0) o104->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1575595752 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [2763449.355611] Lustre: 68826:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [2763492.726698] Lustre: fir-MDT0002: haven't heard from client e1cd7273-2bbc-d595-423d-2a040ceb8bd1 (at 10.8.24.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a91b94e3800, cur 1575595796 expire 1575595646 last 1575595569 [2763492.748728] LustreError: 68826:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) failed to reply to blocking AST (req@ffff9a5abd151200 x1651387311050096 status 0 rc -5), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9a7d4730a640/0xdeacf8edf708ab86 lrc: 3/0,0 mode: PR/PR res: [0x2c0000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 882 type: IBT flags: 0x54a01400000020 nid: 10.8.23.14@o2ib6 remote: 0xb1db24ae93e5e127 expref: 3 pid: 44499 timeout: 2763560 lvb_type: 0 [2763492.791341] LustreError: 68826:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message [2763492.801524] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock blocking callback time out: rc -5 [2763492.814056] LustreError: Skipped 1 previous similar message [2763508.653124] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2763788.052691] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2763805.722257] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2763805.734342] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 44 previous similar messages [2763850.729567] Lustre: fir-MDT0002: haven't heard from client 6a322af4-b891-28e9-9a3d-acf3701a2d87 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6b72adfc00, cur 1575596154 expire 1575596004 last 1575595927 [2763850.751446] Lustre: Skipped 3 previous similar messages [2763930.725478] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2763930.735646] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 44 previous similar messages [2764038.734357] Lustre: fir-MDT0002: haven't heard from client d7eaf46a-6d28-f47a-ad06-92022cfaefa1 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a77be2d0800, cur 1575596342 expire 1575596192 last 1575596115 [2764172.001160] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2764372.873087] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2764407.759560] Lustre: fir-MDT0002: haven't heard from client 65c0a8ba-77a2-39d1-aa40-e70fa84fd89c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6c68294c00, cur 1575596711 expire 1575596561 last 1575596484 [2764415.737916] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2764415.749997] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 44 previous similar messages [2764545.741240] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.3@o2ib7: 0 seconds [2764545.751412] LNet: 43254:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 45 previous similar messages [2764950.978390] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2764974.757522] Lustre: fir-MDT0002: haven't heard from client 7c3174d2-d383-5883-5c61-89b163bf438a (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a5cbb64d000, cur 1575597278 expire 1575597128 last 1575597051 [2765015.754396] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900 [2765015.766481] LNetError: 43254:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 45 previous similar messages [2765045.422811] Lustre: fir-MDT0002: Connection restored to f21c0aa1-d268-a7ff-fa0a-39cd24e3bb04 (at 10.9.107.20@o2ib4) [2765179.828615] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2765179.839138] Lustre: Skipped 3 previous similar messages [2765228.765541] Lustre: fir-MDT0002: haven't heard from client bd8c6f89-6e53-2076-240f-4892dca3e45f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a773ee88400, cur 1575597532 expire 1575597382 last 1575597305 [2767117.931326] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2767163.814780] Lustre: fir-MDT0002: haven't heard from client 584239b1-538d-aeee-6257-7b100204e307 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a721e321c00, cur 1575599467 expire 1575599317 last 1575599240 [2767463.921839] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2767521.823976] Lustre: fir-MDT0002: haven't heard from client d4a61f8b-a174-3617-9e6b-da0f6cf95b91 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6edbc79000, cur 1575599825 expire 1575599675 last 1575599598 [2768267.843859] Lustre: fir-MDT0002: haven't heard from client 04222f13-75d2-3ac1-1974-8820245c02a2 (at 10.9.109.37@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8d053e8c00, cur 1575600571 expire 1575600421 last 1575600344 [2768276.032784] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2768313.675105] Lustre: fir-MDT0002: Connection restored to f92b2e0e-78d1-713e-9a0d-2f3b9a2f05eb (at 10.9.109.37@o2ib4) [2769482.874869] Lustre: fir-MDT0002: haven't heard from client 12d6febd-ab32-5214-27e9-c08cc2ad8190 (at 10.9.113.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a6c556f1000, cur 1575601786 expire 1575601636 last 1575601559 [2769482.896840] Lustre: Skipped 1 previous similar message [2771240.951763] Lustre: fir-MDT0002: Connection restored to da43b289-9055-873a-0d69-4fa27f091ab2 (at 10.9.113.12@o2ib4) [2771467.119342] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2771486.925695] Lustre: fir-MDT0002: haven't heard from client eca3c738-7461-ac35-0f1b-8ee08a4b78e0 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a599ea8d000, cur 1575603790 expire 1575603640 last 1575603563 [2771694.077258] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2771744.933525] Lustre: fir-MDT0002: haven't heard from client 16f9e407-59f7-d568-686b-694fe322aeb3 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7a94e75c00, cur 1575604048 expire 1575603898 last 1575603821 [2773032.556285] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2773039.970041] Lustre: fir-MDT0002: haven't heard from client 8355de1c-1afd-5fce-d3b1-f53c49f06956 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a81529db400, cur 1575605343 expire 1575605193 last 1575605116 [2773474.977225] Lustre: fir-MDT0002: haven't heard from client 10b98365-cf04-43a4-cbaf-e8984c11f9f3 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7012a08c00, cur 1575605778 expire 1575605628 last 1575605551 [2773672.980305] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2773856.127689] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2773898.989490] Lustre: fir-MDT0002: haven't heard from client 4db78a42-d9cd-0f3c-9889-8621461bb90f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a8a63e8dc00, cur 1575606202 expire 1575606052 last 1575605975 [2774039.197605] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2774082.994032] Lustre: fir-MDT0002: haven't heard from client 4fc030e6-222d-04e3-3dc1-bcc8af107362 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7e4c15a000, cur 1575606386 expire 1575606236 last 1575606159 [2778685.113843] Lustre: fir-MDT0002: haven't heard from client 6fda21a0-8a43-b316-2199-d6730c43018e (at 10.9.112.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a61b9797400, cur 1575610988 expire 1575610838 last 1575610761 [2781298.392628] Lustre: fir-MDT0002: Connection restored to 6fda21a0-8a43-b316-2199-d6730c43018e (at 10.9.112.1@o2ib4) [2783979.287796] Lustre: fir-MDT0002: Connection restored to 637f5748-1417-d3fd-5a8f-53cf6ef775d0 (at 10.8.23.14@o2ib6) [2784014.255287] Lustre: fir-MDT0002: haven't heard from client df59f635-439f-13b9-cb7f-ecc69cf7a80c (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9a7abbee6c00, cur 1575616317 expire 1575616167 last 1575616090 [2786965.963124] ------------[ cut here ]------------ [2786965.967920] kernel BUG at /tmp/rpmbuild-lustre-sthiell-Xc32PcQQ/BUILD/lustre-2.12.3_2_gb033996/ldiskfs/htree_lock.c:429! [2786965.978953] invalid opcode: 0000 [#1] SMP [2786965.983276] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ses enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sg ipmi_devintf pcspkr ccp ipmi_msghandler i2c_piix4 k10temp dm_multipath acpi_power_meter dm_mod ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) [2786966.055730] ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core(OE) fb_sys_fops ttm mlxfw(OE) devlink ahci libahci mpt3sas(OE) drm tg3 crct10dif_pclmul mlx_compat(OE) crct10dif_common raid_class crc32c_intel libata ptp megaraid_sas scsi_transport_sas drm_panel_orientation_quirks pps_core [last unloaded: libcfs] [2786966.086761] CPU: 1 PID: 68784 Comm: mdt01_110 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 [2786966.099526] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019 [2786966.107352] task: ffff9a6086df9040 ti: ffff9a7916f4c000 task.ti: ffff9a7916f4c000 [2786966.115003] RIP: 0010:[] [] htree_node_unlock+0x4b4/0x4c0 [ldiskfs] [2786966.124694] RSP: 0018:ffff9a7916f4f8b0 EFLAGS: 00010246 [2786966.130180] RAX: ffff9a57f63e7000 RBX: 0000000000000001 RCX: ffff9a6611112490 [2786966.137487] RDX: 00000000000000c8 RSI: 0000000000000001 RDI: 0000000000000000 [2786966.144792] RBP: ffff9a7916f4f928 R08: ffff9a7720ec6b60 R09: ffff9a610b87c100 [2786966.152098] R10: 0000000000000000 R11: ffff9a709075811f R12: ffff9a66111124d8 [2786966.159403] R13: 0000000000000000 R14: ffff9a6fcf88d040 R15: ffff9a70907580fc [2786966.166711] FS: 00007f32e0150700(0000) GS:ffff9a71bf600000(0000) knlGS:0000000000000000 [2786966.174970] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2786966.180890] CR2: 00007f32e0224000 CR3: 0000002035ab2000 CR4: 00000000003407e0 [2786966.188196] Call Trace: [2786966.190835] [] htree_node_release_all+0x5a/0x80 [ldiskfs] [2786966.198061] [] htree_unlock+0x22/0x70 [ldiskfs] [2786966.204423] [] osd_index_ea_delete+0x30e/0xb10 [osd_ldiskfs] [2786966.211917] [] lod_sub_delete+0x1c8/0x460 [lod] [2786966.218281] [] ? __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] [2786966.226026] [] lod_delete+0x24/0x30 [lod] [2786966.231872] [] __mdd_index_delete_only+0x194/0x250 [mdd] [2786966.239007] [] __mdd_index_delete+0x46/0x290 [mdd] [2786966.245631] [] mdd_unlink+0x5f8/0xaa0 [mdd] [2786966.251658] [] mdo_unlink+0x46/0x48 [mdt] [2786966.257502] [] mdt_reint_unlink+0xbed/0x14b0 [mdt] [2786966.264131] [] mdt_reint_rec+0x83/0x210 [mdt] [2786966.270317] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [2786966.277027] [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] [2786966.283994] [] mdt_reint+0x67/0x140 [mdt] [2786966.289890] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [2786966.296973] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [2786966.304723] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [2786966.311982] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [2786966.319841] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [2786966.326802] [] ? __wake_up+0x44/0x50 [2786966.332241] [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [2786966.338715] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [2786966.346283] [] kthread+0xd1/0xe0 [2786966.351335] [] ? insert_kthread_work+0x40/0x40 [2786966.357604] [] ret_from_fork_nospec_begin+0xe/0x21 [2786966.364214] [] ? insert_kthread_work+0x40/0x40 [2786966.370479] Code: 0f 0b 48 8b 45 90 8b 55 8c f3 90 0f a3 10 19 c9 85 c9 75 f5 f0 0f ab 10 19 c9 85 c9 0f 84 a4 fb ff ff eb e5 0f 1f 00 0f 0b 0f 0b <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 89 f0 48 [2786966.391175] RIP [] htree_node_unlock+0x4b4/0x4c0 [ldiskfs] [2786966.398516] RSP