[ 5.991328] microcode: CPU19: patch_level=0x08001227 [ 5.991337] microcode: CPU20: patch_level=0x08001227 [ 5.991349] microcode: CPU21: patch_level=0x08001227 [ 5.991359] microcode: CPU22: patch_level=0x08001227 [ 5.991370] microcode: CPU23: patch_level=0x08001227 [ 5.991380] microcode: CPU24: patch_level=0x08001227 [ 5.991388] microcode: CPU25: patch_level=0x08001227 [ 5.991396] microcode: CPU26: patch_level=0x08001227 [ 5.991404] microcode: CPU27: patch_level=0x08001227 [ 5.991412] microcode: CPU28: patch_level=0x08001227 [ 5.991420] microcode: CPU29: patch_level=0x08001227 [ 5.991428] microcode: CPU30: patch_level=0x08001227 [ 5.991439] microcode: CPU31: patch_level=0x08001227 [ 5.991447] microcode: CPU32: patch_level=0x08001227 [ 5.991455] microcode: CPU33: patch_level=0x08001227 [ 5.991464] microcode: CPU34: patch_level=0x08001227 [ 5.991472] microcode: CPU35: patch_level=0x08001227 [ 5.991480] microcode: CPU36: patch_level=0x08001227 [ 5.991488] microcode: CPU37: patch_level=0x08001227 [ 5.991496] microcode: CPU38: patch_level=0x08001227 [ 5.991507] microcode: CPU39: patch_level=0x08001227 [ 5.991515] microcode: CPU40: patch_level=0x08001227 [ 5.991523] microcode: CPU41: patch_level=0x08001227 [ 5.991531] microcode: CPU42: patch_level=0x08001227 [ 5.991539] microcode: CPU43: patch_level=0x08001227 [ 5.991546] microcode: CPU44: patch_level=0x08001227 [ 5.991555] microcode: CPU45: patch_level=0x08001227 [ 5.991563] microcode: CPU46: patch_level=0x08001227 [ 5.991571] microcode: CPU47: patch_level=0x08001227 [ 5.991617] microcode: Microcode Update Driver: v2.01 , Peter Oruba [ 5.991771] PM: Hibernation image not present or could not be loaded. [ 5.991774] Loading compiled-in X.509 certificates [ 5.992152] Loaded X.509 cert 'Red Hat Enterprise Linux Driver Update Program (key 3): bf57f3e87362bc7229d9f465321773dfd1f77a80' [ 5.992521] Loaded X.509 cert 'Red Hat Enterprise Linux kpatch signing key: 4d38fd864ebe18c5f0b72e3852e2014c3a676fc8' [ 5.992889] Loaded X.509 cert 'Red Hat Enterprise Linux kernel signing key: 26463bf7b35aa6e910b2216d61318fa5ff5b7954' [ 5.992905] registered taskstats version 1 [ 5.995016] Key type trusted registered [ 5.996566] Key type encrypted registered [ 5.996611] IMA: No TPM chip found, activating TPM-bypass! (rc=-19) [ 5.998972] Magic number: 7:952:465 [ 6.005667] rtc_cmos 00:01: setting system clock to 2019-02-07 05:28:26 UTC (1549517306) [ 6.377065] Switched to clocksource tsc [ 6.381982] Freeing unused kernel memory: 1876k freed [ 6.387272] Write protecting the kernel read-only data: 12288k [ 6.389222] usb 3-1.1: New USB device found, idVendor=1604, idProduct=10c0 [ 6.389224] usb 3-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 6.395988] hub 3-1.1:1.0: USB hub found [ 6.396345] hub 3-1.1:1.0: 4 ports detected [ 6.416857] Freeing unused kernel memory: 516k freed [ 6.423274] Freeing unused kernel memory: 600k freed [ 6.460390] usb 3-1.4: new high-speed USB device number 4 using xhci_hcd [ 6.482289] systemd[1]: systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN) [ 6.492214] usb 1-1: new high-speed USB device number 2 using xhci_hcd [ 6.507918] systemd[1]: Detected architecture x86-64. [ 6.512979] systemd[1]: Running in initial RAM disk. [ 6.526355] systemd[1]: Set hostname to . [ 6.547102] usb 3-1.4: New USB device found, idVendor=1604, idProduct=10c0 [ 6.553986] usb 3-1.4: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 6.566888] systemd[1]: Reached target Local File Systems. [ 6.578293] systemd[1]: Reached target Swap. [ 6.587482] systemd[1]: Created slice Root Slice. [ 6.588000] hub 3-1.4:1.0: USB hub found [ 6.588350] hub 3-1.4:1.0: 4 ports detected [ 6.605304] systemd[1]: Listening on udev Control Socket. [ 6.616266] systemd[1]: Reached target Timers. [ 6.622120] usb 1-1: New USB device found, idVendor=0424, idProduct=2744 [ 6.628933] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 6.637451] usb 1-1: Product: USB2734 [ 6.641983] usb 1-1: Manufacturer: Microchip Tech [ 6.646812] systemd[1]: Created slice System Slice. [ 6.656271] systemd[1]: Reached target Slices. [ 6.660939] hub 1-1:1.0: USB hub found [ 6.666246] hub 1-1:1.0: 4 ports detected [ 6.673307] systemd[1]: Listening on Journal Socket. [ 6.684756] systemd[1]: Starting Journal Service... [ 6.694787] systemd[1]: Starting Load Kernel Modules... [ 6.704741] systemd[1]: Starting dracut cmdline hook... [ 6.714720] systemd[1]: Starting Create list of required static device nodes for the current kernel... [ 6.727251] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd [ 6.739660] systemd[1]: Starting Setup Virtual Console... [ 6.748373] usb 2-1: New USB device found, idVendor=0424, idProduct=5744 [ 6.748374] usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=0 [ 6.748375] usb 2-1: Product: USB5734 [ 6.748376] usb 2-1: Manufacturer: Microchip Tech [ 6.750292] systemd[1]: Listening on udev Kernel Socket. [ 6.754896] hub 2-1:1.0: USB hub found [ 6.755246] hub 2-1:1.0: 4 ports detected [ 6.756304] usb: port power management may be unreliable [ 6.811283] systemd[1]: Reached target Sockets. [ 6.821541] systemd[1]: Started Journal Service. [ 7.010010] mpt3sas: loading out-of-tree module taints kernel. [ 7.024397] mpt3sas: module verification failed: signature and/or required key missing - tainting kernel [ 7.038241] pps_core: LinuxPPS API ver. 1 registered [ 7.044080] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti [ 7.044680] mpt3sas version 27.00.00.00 loaded [ 7.046010] mpt3sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (263565260 kB) [ 7.056258] mpt3sas_cm0: IOC Number : 0 [ 7.056260] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k [ 7.056359] mpt3sas 0000:01:00.0: irq 68 for MSI/MSI-X [ 7.056380] mpt3sas 0000:01:00.0: irq 69 for MSI/MSI-X [ 7.056404] mpt3sas 0000:01:00.0: irq 70 for MSI/MSI-X [ 7.056426] mpt3sas 0000:01:00.0: irq 71 for MSI/MSI-X [ 7.056450] mpt3sas 0000:01:00.0: irq 72 for MSI/MSI-X [ 7.056471] mpt3sas 0000:01:00.0: irq 73 for MSI/MSI-X [ 7.056493] mpt3sas 0000:01:00.0: irq 74 for MSI/MSI-X [ 7.056515] mpt3sas 0000:01:00.0: irq 75 for MSI/MSI-X [ 7.056536] mpt3sas 0000:01:00.0: irq 76 for MSI/MSI-X [ 7.056558] mpt3sas 0000:01:00.0: irq 77 for MSI/MSI-X [ 7.056579] mpt3sas 0000:01:00.0: irq 78 for MSI/MSI-X [ 7.056599] mpt3sas 0000:01:00.0: irq 79 for MSI/MSI-X [ 7.056621] mpt3sas 0000:01:00.0: irq 80 for MSI/MSI-X [ 7.056642] mpt3sas 0000:01:00.0: irq 81 for MSI/MSI-X [ 7.056662] mpt3sas 0000:01:00.0: irq 82 for MSI/MSI-X [ 7.056682] mpt3sas 0000:01:00.0: irq 83 for MSI/MSI-X [ 7.056703] mpt3sas 0000:01:00.0: irq 84 for MSI/MSI-X [ 7.056725] mpt3sas 0000:01:00.0: irq 85 for MSI/MSI-X [ 7.056745] mpt3sas 0000:01:00.0: irq 86 for MSI/MSI-X [ 7.056765] mpt3sas 0000:01:00.0: irq 87 for MSI/MSI-X [ 7.056787] mpt3sas 0000:01:00.0: irq 88 for MSI/MSI-X [ 7.056808] mpt3sas 0000:01:00.0: irq 89 for MSI/MSI-X [ 7.056829] mpt3sas 0000:01:00.0: irq 90 for MSI/MSI-X [ 7.056850] mpt3sas 0000:01:00.0: irq 91 for MSI/MSI-X [ 7.056871] mpt3sas 0000:01:00.0: irq 92 for MSI/MSI-X [ 7.056892] mpt3sas 0000:01:00.0: irq 93 for MSI/MSI-X [ 7.056912] mpt3sas 0000:01:00.0: irq 94 for MSI/MSI-X [ 7.056932] mpt3sas 0000:01:00.0: irq 95 for MSI/MSI-X [ 7.056955] mpt3sas 0000:01:00.0: irq 96 for MSI/MSI-X [ 7.056976] mpt3sas 0000:01:00.0: irq 97 for MSI/MSI-X [ 7.056997] mpt3sas 0000:01:00.0: irq 98 for MSI/MSI-X [ 7.057017] mpt3sas 0000:01:00.0: irq 99 for MSI/MSI-X [ 7.057038] mpt3sas 0000:01:00.0: irq 100 for MSI/MSI-X [ 7.057059] mpt3sas 0000:01:00.0: irq 101 for MSI/MSI-X [ 7.057081] mpt3sas 0000:01:00.0: irq 102 for MSI/MSI-X [ 7.057100] mpt3sas 0000:01:00.0: irq 103 for MSI/MSI-X [ 7.057125] mpt3sas 0000:01:00.0: irq 104 for MSI/MSI-X [ 7.057145] mpt3sas 0000:01:00.0: irq 105 for MSI/MSI-X [ 7.057165] mpt3sas 0000:01:00.0: irq 106 for MSI/MSI-X [ 7.057185] mpt3sas 0000:01:00.0: irq 107 for MSI/MSI-X [ 7.057206] mpt3sas 0000:01:00.0: irq 108 for MSI/MSI-X [ 7.057233] mpt3sas 0000:01:00.0: irq 109 for MSI/MSI-X [ 7.057253] mpt3sas 0000:01:00.0: irq 110 for MSI/MSI-X [ 7.057273] mpt3sas 0000:01:00.0: irq 111 for MSI/MSI-X [ 7.057296] mpt3sas 0000:01:00.0: irq 112 for MSI/MSI-X [ 7.057317] mpt3sas 0000:01:00.0: irq 113 for MSI/MSI-X [ 7.057338] mpt3sas 0000:01:00.0: irq 114 for MSI/MSI-X [ 7.057358] mpt3sas 0000:01:00.0: irq 115 for MSI/MSI-X [ 7.058295] mpt3sas0-msix0: PCI-MSI-X enabled: IRQ 68 [ 7.058296] mpt3sas0-msix1: PCI-MSI-X enabled: IRQ 69 [ 7.058297] mpt3sas0-msix2: PCI-MSI-X enabled: IRQ 70 [ 7.058297] mpt3sas0-msix3: PCI-MSI-X enabled: IRQ 71 [ 7.058298] mpt3sas0-msix4: PCI-MSI-X enabled: IRQ 72 [ 7.058299] mpt3sas0-msix5: PCI-MSI-X enabled: IRQ 73 [ 7.058299] mpt3sas0-msix6: PCI-MSI-X enabled: IRQ 74 [ 7.058300] mpt3sas0-msix7: PCI-MSI-X enabled: IRQ 75 [ 7.058300] mpt3sas0-msix8: PCI-MSI-X enabled: IRQ 76 [ 7.058301] mpt3sas0-msix9: PCI-MSI-X enabled: IRQ 77 [ 7.058301] mpt3sas0-msix10: PCI-MSI-X enabled: IRQ 78 [ 7.058302] mpt3sas0-msix11: PCI-MSI-X enabled: IRQ 79 [ 7.058303] mpt3sas0-msix12: PCI-MSI-X enabled: IRQ 80 [ 7.058303] mpt3sas0-msix13: PCI-MSI-X enabled: IRQ 81 [ 7.058304] mpt3sas0-msix14: PCI-MSI-X enabled: IRQ 82 [ 7.058304] mpt3sas0-msix15: PCI-MSI-X enabled: IRQ 83 [ 7.058305] mpt3sas0-msix16: PCI-MSI-X enabled: IRQ 84 [ 7.058305] mpt3sas0-msix17: PCI-MSI-X enabled: IRQ 85 [ 7.058306] mpt3sas0-msix18: PCI-MSI-X enabled: IRQ 86 [ 7.058306] mpt3sas0-msix19: PCI-MSI-X enabled: IRQ 87 [ 7.058307] mpt3sas0-msix20: PCI-MSI-X enabled: IRQ 88 [ 7.058308] mpt3sas0-msix21: PCI-MSI-X enabled: IRQ 89 [ 7.058308] mpt3sas0-msix22: PCI-MSI-X enabled: IRQ 90 [ 7.058309] mpt3sas0-msix23: PCI-MSI-X enabled: IRQ 91 [ 7.058309] mpt3sas0-msix24: PCI-MSI-X enabled: IRQ 92 [ 7.058310] mpt3sas0-msix25: PCI-MSI-X enabled: IRQ 93 [ 7.058310] mpt3sas0-msix26: PCI-MSI-X enabled: IRQ 94 [ 7.058311] mpt3sas0-msix27: PCI-MSI-X enabled: IRQ 95 [ 7.058312] mpt3sas0-msix28: PCI-MSI-X enabled: IRQ 96 [ 7.058312] mpt3sas0-msix29: PCI-MSI-X enabled: IRQ 97 [ 7.058313] mpt3sas0-msix30: PCI-MSI-X enabled: IRQ 98 [ 7.058313] mpt3sas0-msix31: PCI-MSI-X enabled: IRQ 99 [ 7.058314] mpt3sas0-msix32: PCI-MSI-X enabled: IRQ 100 [ 7.058314] mpt3sas0-msix33: PCI-MSI-X enabled: IRQ 101 [ 7.058315] mpt3sas0-msix34: PCI-MSI-X enabled: IRQ 102 [ 7.058316] mpt3sas0-msix35: PCI-MSI-X enabled: IRQ 103 [ 7.058316] mpt3sas0-msix36: PCI-MSI-X enabled: IRQ 104 [ 7.058317] mpt3sas0-msix37: PCI-MSI-X enabled: IRQ 105 [ 7.058317] mpt3sas0-msix38: PCI-MSI-X enabled: IRQ 106 [ 7.058318] mpt3sas0-msix39: PCI-MSI-X enabled: IRQ 107 [ 7.058318] mpt3sas0-msix40: PCI-MSI-X enabled: IRQ 108 [ 7.058319] mpt3sas0-msix41: PCI-MSI-X enabled: IRQ 109 [ 7.058320] mpt3sas0-msix42: PCI-MSI-X enabled: IRQ 110 [ 7.058320] mpt3sas0-msix43: PCI-MSI-X enabled: IRQ 111 [ 7.058321] mpt3sas0-msix44: PCI-MSI-X enabled: IRQ 112 [ 7.058321] mpt3sas0-msix45: PCI-MSI-X enabled: IRQ 113 [ 7.058322] mpt3sas0-msix46: PCI-MSI-X enabled: IRQ 114 [ 7.058322] mpt3sas0-msix47: PCI-MSI-X enabled: IRQ 115 [ 7.058324] mpt3sas_cm0: iomem(0x00000000e1000000), mapped(0xffff9dc39a000000), size(1048576) [ 7.058325] mpt3sas_cm0: ioport(0x0000000000001000), size(256) [ 7.071321] mpt3sas_cm0: IOC Number : 0 [ 7.071323] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k [ 7.232282] PTP clock support registered [ 7.232303] megasas: 07.705.02.00-rh1 [ 7.232607] megaraid_sas 0000:c1:00.0: FW now in Ready state [ 7.232610] megaraid_sas 0000:c1:00.0: 64 bit DMA mask and 32 bit consistent mask [ 7.232956] megaraid_sas 0000:c1:00.0: irq 117 for MSI/MSI-X [ 7.232982] megaraid_sas 0000:c1:00.0: irq 118 for MSI/MSI-X [ 7.233005] megaraid_sas 0000:c1:00.0: irq 119 for MSI/MSI-X [ 7.233029] megaraid_sas 0000:c1:00.0: irq 120 for MSI/MSI-X [ 7.233051] megaraid_sas 0000:c1:00.0: irq 121 for MSI/MSI-X [ 7.233074] megaraid_sas 0000:c1:00.0: irq 122 for MSI/MSI-X [ 7.233097] megaraid_sas 0000:c1:00.0: irq 123 for MSI/MSI-X [ 7.233120] megaraid_sas 0000:c1:00.0: irq 124 for MSI/MSI-X [ 7.233143] megaraid_sas 0000:c1:00.0: irq 125 for MSI/MSI-X [ 7.233166] megaraid_sas 0000:c1:00.0: irq 126 for MSI/MSI-X [ 7.233189] megaraid_sas 0000:c1:00.0: irq 127 for MSI/MSI-X [ 7.233212] megaraid_sas 0000:c1:00.0: irq 128 for MSI/MSI-X [ 7.233251] megaraid_sas 0000:c1:00.0: irq 129 for MSI/MSI-X [ 7.233274] megaraid_sas 0000:c1:00.0: irq 130 for MSI/MSI-X [ 7.233297] megaraid_sas 0000:c1:00.0: irq 131 for MSI/MSI-X [ 7.233320] megaraid_sas 0000:c1:00.0: irq 132 for MSI/MSI-X [ 7.233344] megaraid_sas 0000:c1:00.0: irq 133 for MSI/MSI-X [ 7.233368] megaraid_sas 0000:c1:00.0: irq 134 for MSI/MSI-X [ 7.233391] megaraid_sas 0000:c1:00.0: irq 135 for MSI/MSI-X [ 7.233414] megaraid_sas 0000:c1:00.0: irq 136 for MSI/MSI-X [ 7.233438] megaraid_sas 0000:c1:00.0: irq 137 for MSI/MSI-X [ 7.233462] megaraid_sas 0000:c1:00.0: irq 138 for MSI/MSI-X [ 7.233486] megaraid_sas 0000:c1:00.0: irq 139 for MSI/MSI-X [ 7.233510] megaraid_sas 0000:c1:00.0: irq 140 for MSI/MSI-X [ 7.233539] megaraid_sas 0000:c1:00.0: irq 141 for MSI/MSI-X [ 7.233563] megaraid_sas 0000:c1:00.0: irq 142 for MSI/MSI-X [ 7.233587] megaraid_sas 0000:c1:00.0: irq 143 for MSI/MSI-X [ 7.233611] megaraid_sas 0000:c1:00.0: irq 144 for MSI/MSI-X [ 7.233635] megaraid_sas 0000:c1:00.0: irq 145 for MSI/MSI-X [ 7.233660] megaraid_sas 0000:c1:00.0: irq 146 for MSI/MSI-X [ 7.233683] megaraid_sas 0000:c1:00.0: irq 147 for MSI/MSI-X [ 7.233707] megaraid_sas 0000:c1:00.0: irq 148 for MSI/MSI-X [ 7.233731] megaraid_sas 0000:c1:00.0: irq 149 for MSI/MSI-X [ 7.233761] megaraid_sas 0000:c1:00.0: irq 150 for MSI/MSI-X [ 7.233786] megaraid_sas 0000:c1:00.0: irq 151 for MSI/MSI-X [ 7.233808] megaraid_sas 0000:c1:00.0: irq 152 for MSI/MSI-X [ 7.233831] megaraid_sas 0000:c1:00.0: irq 153 for MSI/MSI-X [ 7.233856] megaraid_sas 0000:c1:00.0: irq 154 for MSI/MSI-X [ 7.233879] megaraid_sas 0000:c1:00.0: irq 155 for MSI/MSI-X [ 7.233901] megaraid_sas 0000:c1:00.0: irq 156 for MSI/MSI-X [ 7.233925] megaraid_sas 0000:c1:00.0: irq 157 for MSI/MSI-X [ 7.233943] megaraid_sas 0000:c1:00.0: irq 158 for MSI/MSI-X [ 7.233963] megaraid_sas 0000:c1:00.0: irq 159 for MSI/MSI-X [ 7.233983] megaraid_sas 0000:c1:00.0: irq 160 for MSI/MSI-X [ 7.234001] megaraid_sas 0000:c1:00.0: irq 161 for MSI/MSI-X [ 7.234020] megaraid_sas 0000:c1:00.0: irq 162 for MSI/MSI-X [ 7.234041] megaraid_sas 0000:c1:00.0: irq 163 for MSI/MSI-X [ 7.234060] megaraid_sas 0000:c1:00.0: irq 164 for MSI/MSI-X [ 7.234259] megaraid_sas 0000:c1:00.0: firmware supports msix : (96) [ 7.234261] megaraid_sas 0000:c1:00.0: current msix/online cpus : (48/48) [ 7.234262] megaraid_sas 0000:c1:00.0: RDPQ mode : (disabled) [ 7.234264] megaraid_sas 0000:c1:00.0: Current firmware supports maximum commands: 928 LDIO threshold: 237 [ 7.234603] megaraid_sas 0000:c1:00.0: Configured max firmware commands: 927 [ 7.237741] megaraid_sas 0000:c1:00.0: FW supports sync cache : No [ 7.260617] mpt3sas_cm0: Allocated physical memory: size(38831 kB) [ 7.260619] mpt3sas_cm0: Current Controller Queue Depth(7564), Max Controller Queue Depth(7680) [ 7.260619] mpt3sas_cm0: Scatter Gather Elements per IO(128) [ 7.261802] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 7.378129] mpt3sas_cm0: FW Package Version(08.00.00.00) [ 7.378494] mpt3sas_cm0: SAS3616: FWVersion(08.00.00.00), ChipRevision(0x02), BiosVersion(00.00.00.00) [ 7.378500] mpt3sas_cm0: Protocol=(Initiator,Target,NVMe), Capabilities=(TLR,EEDP,Diag Trace Buffer,Task Set Full,NCQ) [ 7.378602] mpt3sas_cm0: : host protection capabilities enabled DIF1 DIF2 DIF3 [ 7.378613] scsi host0: Fusion MPT SAS Host [ 7.379074] mpt3sas_cm0: sending port enable !! [ 7.488975] tg3.c:v3.137 (May 11, 2014) [ 7.489078] libata version 3.00 loaded. [ 7.498091] Compat-mlnx-ofed backport release: b4fdfac [ 7.498092] Backport based on mlnx_ofed/mlnx-ofa_kernel-4.0.git b4fdfac [ 7.498092] compat.git: mlnx_ofed/mlnx-ofa_kernel-4.0.git [ 7.502612] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 7.503724] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 7.513661] tg3 0000:81:00.0 eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address d0:94:66:34:4a:7d [ 7.513663] tg3 0000:81:00.0 eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) [ 7.513665] tg3 0000:81:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] [ 7.513667] tg3 0000:81:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit] [ 7.532196] tg3 0000:81:00.1 eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address d0:94:66:34:4a:7e [ 7.532197] tg3 0000:81:00.1 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) [ 7.532199] tg3 0000:81:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] [ 7.532200] tg3 0000:81:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit] [ 7.552262] mlx5_core 0000:84:00.0: firmware version: 12.24.1000 [ 7.552281] ahci 0000:86:00.2: version 3.0 [ 7.552372] mlx5_core 0000:84:00.0: 126.016 Gb/s available PCIe bandwidth (8 GT/s x16 link) [ 7.553156] ahci 0000:86:00.2: irq 169 for MSI/MSI-X [ 7.553173] ahci 0000:86:00.2: irq 170 for MSI/MSI-X [ 7.553176] ahci 0000:86:00.2: irq 171 for MSI/MSI-X [ 7.553180] ahci 0000:86:00.2: irq 172 for MSI/MSI-X [ 7.553184] ahci 0000:86:00.2: irq 173 for MSI/MSI-X [ 7.553188] ahci 0000:86:00.2: irq 174 for MSI/MSI-X [ 7.553191] ahci 0000:86:00.2: irq 175 for MSI/MSI-X [ 7.553194] ahci 0000:86:00.2: irq 176 for MSI/MSI-X [ 7.553197] ahci 0000:86:00.2: irq 177 for MSI/MSI-X [ 7.553200] ahci 0000:86:00.2: irq 178 for MSI/MSI-X [ 7.553203] ahci 0000:86:00.2: irq 179 for MSI/MSI-X [ 7.553207] ahci 0000:86:00.2: irq 180 for MSI/MSI-X [ 7.553209] ahci 0000:86:00.2: irq 181 for MSI/MSI-X [ 7.553213] ahci 0000:86:00.2: irq 182 for MSI/MSI-X [ 7.553216] ahci 0000:86:00.2: irq 183 for MSI/MSI-X [ 7.553219] ahci 0000:86:00.2: irq 184 for MSI/MSI-X [ 7.553264] ahci 0000:86:00.2: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode [ 7.553266] ahci 0000:86:00.2: flags: 64bit ncq sntf ilck pm led clo only pmp fbs pio slum part [ 7.553592] scsi host2: ahci [ 7.553672] ata1: SATA max UDMA/133 abar m4096@0xc0a02000 port 0xc0a02100 irq 169 [ 7.594246] megaraid_sas 0000:c1:00.0: Init cmd return status SUCCESS for SCSI host 1 [ 7.615238] megaraid_sas 0000:c1:00.0: firmware type : Legacy(64 VD) firmware [ 7.615240] megaraid_sas 0000:c1:00.0: controller type : iMR(0MB) [ 7.615241] megaraid_sas 0000:c1:00.0: Online Controller Reset(OCR) : Enabled [ 7.615242] megaraid_sas 0000:c1:00.0: Secure JBOD support : No [ 7.615243] megaraid_sas 0000:c1:00.0: NVMe passthru support : No [ 7.636761] megaraid_sas 0000:c1:00.0: INIT adapter done [ 7.636764] megaraid_sas 0000:c1:00.0: Jbod map is not supported megasas_setup_jbod_map 5146 [ 7.662681] megaraid_sas 0000:c1:00.0: pci id : (0x1000)/(0x005f)/(0x1028)/(0x1f4b) [ 7.662683] megaraid_sas 0000:c1:00.0: unevenspan support : yes [ 7.662684] megaraid_sas 0000:c1:00.0: firmware crash dump : no [ 7.662685] megaraid_sas 0000:c1:00.0: jbod sync map : no [ 7.662690] scsi host1: Avago SAS based MegaRAID driver [ 7.683809] scsi 1:2:0:0: Direct-Access DELL PERC H330 Mini 4.29 PQ: 0 ANSI: 5 [ 7.861255] ata1: SATA link down (SStatus 0 SControl 300) [ 9.557232] mlx5_core 0000:84:00.0: irq 185 for MSI/MSI-X [ 9.557255] mlx5_core 0000:84:00.0: irq 186 for MSI/MSI-X [ 9.557276] mlx5_core 0000:84:00.0: irq 187 for MSI/MSI-X [ 9.557308] mlx5_core 0000:84:00.0: irq 188 for MSI/MSI-X [ 9.557326] mlx5_core 0000:84:00.0: irq 189 for MSI/MSI-X [ 9.557348] mlx5_core 0000:84:00.0: irq 190 for MSI/MSI-X [ 9.557368] mlx5_core 0000:84:00.0: irq 191 for MSI/MSI-X [ 9.557392] mlx5_core 0000:84:00.0: irq 192 for MSI/MSI-X [ 9.557412] mlx5_core 0000:84:00.0: irq 193 for MSI/MSI-X [ 9.557432] mlx5_core 0000:84:00.0: irq 194 for MSI/MSI-X [ 9.557451] mlx5_core 0000:84:00.0: irq 195 for MSI/MSI-X [ 9.557473] mlx5_core 0000:84:00.0: irq 196 for MSI/MSI-X [ 9.557494] mlx5_core 0000:84:00.0: irq 197 for MSI/MSI-X [ 9.557513] mlx5_core 0000:84:00.0: irq 198 for MSI/MSI-X [ 9.557532] mlx5_core 0000:84:00.0: irq 199 for MSI/MSI-X [ 9.557552] mlx5_core 0000:84:00.0: irq 200 for MSI/MSI-X [ 9.557570] mlx5_core 0000:84:00.0: irq 201 for MSI/MSI-X [ 9.557589] mlx5_core 0000:84:00.0: irq 202 for MSI/MSI-X [ 9.557611] mlx5_core 0000:84:00.0: irq 203 for MSI/MSI-X [ 9.557629] mlx5_core 0000:84:00.0: irq 204 for MSI/MSI-X [ 9.557648] mlx5_core 0000:84:00.0: irq 205 for MSI/MSI-X [ 9.557668] mlx5_core 0000:84:00.0: irq 206 for MSI/MSI-X [ 9.557687] mlx5_core 0000:84:00.0: irq 207 for MSI/MSI-X [ 9.557706] mlx5_core 0000:84:00.0: irq 208 for MSI/MSI-X [ 9.557723] mlx5_core 0000:84:00.0: irq 209 for MSI/MSI-X [ 9.557742] mlx5_core 0000:84:00.0: irq 210 for MSI/MSI-X [ 9.557763] mlx5_core 0000:84:00.0: irq 211 for MSI/MSI-X [ 9.557782] mlx5_core 0000:84:00.0: irq 212 for MSI/MSI-X [ 9.557801] mlx5_core 0000:84:00.0: irq 213 for MSI/MSI-X [ 9.557819] mlx5_core 0000:84:00.0: irq 214 for MSI/MSI-X [ 9.557838] mlx5_core 0000:84:00.0: irq 215 for MSI/MSI-X [ 9.557858] mlx5_core 0000:84:00.0: irq 216 for MSI/MSI-X [ 9.557876] mlx5_core 0000:84:00.0: irq 217 for MSI/MSI-X [ 9.557895] mlx5_core 0000:84:00.0: irq 218 for MSI/MSI-X [ 9.557913] mlx5_core 0000:84:00.0: irq 219 for MSI/MSI-X [ 9.557931] mlx5_core 0000:84:00.0: irq 220 for MSI/MSI-X [ 9.557950] mlx5_core 0000:84:00.0: irq 221 for MSI/MSI-X [ 9.557972] mlx5_core 0000:84:00.0: irq 222 for MSI/MSI-X [ 9.557990] mlx5_core 0000:84:00.0: irq 223 for MSI/MSI-X [ 9.558009] mlx5_core 0000:84:00.0: irq 224 for MSI/MSI-X [ 9.558029] mlx5_core 0000:84:00.0: irq 225 for MSI/MSI-X [ 9.558047] mlx5_core 0000:84:00.0: irq 226 for MSI/MSI-X [ 9.558066] mlx5_core 0000:84:00.0: irq 227 for MSI/MSI-X [ 9.558084] mlx5_core 0000:84:00.0: irq 228 for MSI/MSI-X [ 9.558104] mlx5_core 0000:84:00.0: irq 229 for MSI/MSI-X [ 9.558131] mlx5_core 0000:84:00.0: irq 230 for MSI/MSI-X [ 9.558150] mlx5_core 0000:84:00.0: irq 231 for MSI/MSI-X [ 9.558170] mlx5_core 0000:84:00.0: irq 232 for MSI/MSI-X [ 9.558189] mlx5_core 0000:84:00.0: irq 233 for MSI/MSI-X [ 9.558208] mlx5_core 0000:84:00.0: irq 234 for MSI/MSI-X [ 9.558226] mlx5_core 0000:84:00.0: irq 235 for MSI/MSI-X [ 9.558244] mlx5_core 0000:84:00.0: irq 236 for MSI/MSI-X [ 9.560777] mlx5_core 0000:84:00.0: Port module event: module 0, Cable plugged [ 9.725132] mlx5_core 0000:84:00.0: FW Tracer Owner [ 9.731460] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 9.750334] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 9.763876] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 9.778601] mlx5_ib: Mellanox Connect-IB Infiniband driver v4.5-1.0.1 [ 9.785053] mlx5_ib: Mellanox Connect-IB Infiniband driver v4.5-1.0.1 [ 9.829357] mpt3sas_cm0: hba_port entry: ffff8997fc7c0480, port: 255 is added to hba_port list [ 9.839795] mpt3sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b00db90c00), phys(17) [ 9.848297] mpt3sas_cm0: detecting: handle(0x0011), sas_address(0x510600b00db90c00), phy(16) [ 9.856740] mpt3sas_cm0: REPORT_LUNS: handle(0x0011), retries(0) [ 9.862775] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0011), lun(0) [ 9.869144] scsi 0:0:0:0: Enclosure LSI virtualSES 02 PQ: 0 ANSI: 6 [ 9.877272] scsi 0:0:0:0: set ignore_delay_remove for handle(0x0011) [ 9.883625] scsi 0:0:0:0: SES: handle(0x0011), sas_addr(0x510600b00db90c00), phy(16), device_name(0x510600b00db90c00) [ 9.894224] scsi 0:0:0:0: enclosure logical id(0x500605b00db90c00), slot(16) [ 9.901355] scsi 0:0:0:0: enclosure level(0x0000), connector name( C3 ) [ 9.908077] scsi 0:0:0:0: serial_number(500605B00DB90C00) [ 9.913486] scsi 0:0:0:0: qdepth(1), tagged(0), simple(0), ordered(0), scsi_level(7), cmd_que(0) [ 9.922294] mpt3sas_cm0: log_info(0x31200206): originator(PL), code(0x20), sub_code(0x0206) [ 9.955591] mpt3sas_cm0: detecting: handle(0x0012), sas_address(0x500a0984db2fa920), phy(8) [ 9.963946] mpt3sas_cm0: REPORT_LUNS: handle(0x0012), retries(0) [ 10.022819] random: crng init done [ 10.061360] mpt3sas_cm0: REPORT_LUNS: handle(0x0012), retries(1) [ 10.068440] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0012), lun(0) [ 10.074741] mpt3sas_cm0: detecting: handle(0x0012), sas_address(0x500a0984db2fa920), phy(8) [ 10.083107] mpt3sas_cm0: REPORT_LUNS: handle(0x0012), retries(0) [ 10.089839] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0012), lun(0) [ 10.096102] mpt3sas_cm0: detecting: handle(0x0012), sas_address(0x500a0984db2fa920), phy(8) [ 10.104468] mpt3sas_cm0: REPORT_LUNS: handle(0x0012), retries(0) [ 10.111181] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0012), lun(0) [ 10.117707] scsi 0:0:1:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 10.125888] scsi 0:0:1:0: SSP: handle(0x0012), sas_addr(0x500a0984db2fa920), phy(8), device_name(0x500a0984db2fa920) [ 10.136402] scsi 0:0:1:0: enclosure logical id(0x500605b00db90c00), slot(5) [ 10.143447] scsi 0:0:1:0: enclosure level(0x0000), connector name( C1 ) [ 10.150167] scsi 0:0:1:0: serial_number(021815000354 ) [ 10.155566] scsi 0:0:1:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.176434] scsi 0:0:1:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 10.184597] scsi 0:0:1:1: SSP: handle(0x0012), sas_addr(0x500a0984db2fa920), phy(8), device_name(0x500a0984db2fa920) [ 10.195112] scsi 0:0:1:1: enclosure logical id(0x500605b00db90c00), slot(5) [ 10.202156] scsi 0:0:1:1: enclosure level(0x0000), connector name( C1 ) [ 10.208877] scsi 0:0:1:1: serial_number(021815000354 ) [ 10.214277] scsi 0:0:1:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.223464] scsi 0:0:1:1: Mode parameters changed [ 10.243521] scsi 0:0:1:2: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 10.251696] scsi 0:0:1:2: SSP: handle(0x0012), sas_addr(0x500a0984db2fa920), phy(8), device_name(0x500a0984db2fa920) [ 10.262211] scsi 0:0:1:2: enclosure logical id(0x500605b00db90c00), slot(5) [ 10.269257] scsi 0:0:1:2: enclosure level(0x0000), connector name( C1 ) [ 10.275975] scsi 0:0:1:2: serial_number(021815000354 ) [ 10.281376] scsi 0:0:1:2: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.290552] scsi 0:0:1:2: Mode parameters changed [ 10.311511] scsi 0:0:1:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 [ 10.319778] scsi 0:0:1:31: SSP: handle(0x0012), sas_addr(0x500a0984db2fa920), phy(8), device_name(0x500a0984db2fa920) [ 10.330376] scsi 0:0:1:31: enclosure logical id(0x500605b00db90c00), slot(5) [ 10.337508] scsi 0:0:1:31: enclosure level(0x0000), connector name( C1 ) [ 10.344316] scsi 0:0:1:31: serial_number(021815000354 ) [ 10.349801] scsi 0:0:1:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.372901] mpt3sas_cm0: detecting: handle(0x0013), sas_address(0x500a0984da0f9b14), phy(12) [ 10.381343] mpt3sas_cm0: REPORT_LUNS: handle(0x0013), retries(0) [ 10.387506] mpt3sas_cm0: REPORT_LUNS: handle(0x0013), retries(1) [ 10.394946] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0013), lun(0) [ 10.401290] mpt3sas_cm0: detecting: handle(0x0013), sas_address(0x500a0984da0f9b14), phy(12) [ 10.409742] mpt3sas_cm0: REPORT_LUNS: handle(0x0013), retries(0) [ 10.416413] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0013), lun(0) [ 10.425607] scsi 0:0:2:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 10.433797] scsi 0:0:2:0: SSP: handle(0x0013), sas_addr(0x500a0984da0f9b14), phy(12), device_name(0x500a0984da0f9b14) [ 10.444398] scsi 0:0:2:0: enclosure logical id(0x500605b00db90c00), slot(1) [ 10.451443] scsi 0:0:2:0: enclosure level(0x0000), connector name( C0 ) [ 10.458162] scsi 0:0:2:0: serial_number(021812047179 ) [ 10.463563] scsi 0:0:2:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.484238] scsi 0:0:2:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 10.492408] scsi 0:0:2:1: SSP: handle(0x0013), sas_addr(0x500a0984da0f9b14), phy(12), device_name(0x500a0984da0f9b14) [ 10.503010] scsi 0:0:2:1: enclosure logical id(0x500605b00db90c00), slot(1) [ 10.510056] scsi 0:0:2:1: enclosure level(0x0000), connector name( C0 ) [ 10.516782] scsi 0:0:2:1: serial_number(021812047179 ) [ 10.522185] scsi 0:0:2:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.544519] scsi 0:0:2:2: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 10.552694] scsi 0:0:2:2: SSP: handle(0x0013), sas_addr(0x500a0984da0f9b14), phy(12), device_name(0x500a0984da0f9b14) [ 10.563289] scsi 0:0:2:2: enclosure logical id(0x500605b00db90c00), slot(1) [ 10.570335] scsi 0:0:2:2: enclosure level(0x0000), connector name( C0 ) [ 10.577055] scsi 0:0:2:2: serial_number(021812047179 ) [ 10.582456] scsi 0:0:2:2: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.602521] scsi 0:0:2:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 [ 10.610775] scsi 0:0:2:31: SSP: handle(0x0013), sas_addr(0x500a0984da0f9b14), phy(12), device_name(0x500a0984da0f9b14) [ 10.621461] scsi 0:0:2:31: enclosure logical id(0x500605b00db90c00), slot(1) [ 10.628593] scsi 0:0:2:31: enclosure level(0x0000), connector name( C0 ) [ 10.635399] scsi 0:0:2:31: serial_number(021812047179 ) [ 10.640887] scsi 0:0:2:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.661103] mpt3sas_cm0: detecting: handle(0x0014), sas_address(0x500a0984dfa1fa20), phy(0) [ 10.669456] mpt3sas_cm0: REPORT_LUNS: handle(0x0014), retries(0) [ 10.675594] mpt3sas_cm0: REPORT_LUNS: handle(0x0014), retries(1) [ 10.685028] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0014), lun(0) [ 10.691760] mpt3sas_cm0: detecting: handle(0x0014), sas_address(0x500a0984dfa1fa20), phy(0) [ 10.700124] mpt3sas_cm0: REPORT_LUNS: handle(0x0014), retries(0) [ 10.706911] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0014), lun(0) [ 10.713191] mpt3sas_cm0: detecting: handle(0x0014), sas_address(0x500a0984dfa1fa20), phy(0) [ 10.721557] mpt3sas_cm0: REPORT_LUNS: handle(0x0014), retries(0) [ 10.728346] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0014), lun(0) [ 10.734852] scsi 0:0:3:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 10.743033] scsi 0:0:3:0: SSP: handle(0x0014), sas_addr(0x500a0984dfa1fa20), phy(0), device_name(0x500a0984dfa1fa20) [ 10.753545] scsi 0:0:3:0: enclosure logical id(0x500605b00db90c00), slot(13) [ 10.760677] scsi 0:0:3:0: enclosure level(0x0000), connector name( C3 ) [ 10.767400] scsi 0:0:3:0: serial_number(021825001369 ) [ 10.772805] scsi 0:0:3:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.793280] scsi 0:0:3:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 10.801442] scsi 0:0:3:1: SSP: handle(0x0014), sas_addr(0x500a0984dfa1fa20), phy(0), device_name(0x500a0984dfa1fa20) [ 10.811950] scsi 0:0:3:1: enclosure logical id(0x500605b00db90c00), slot(13) [ 10.819082] scsi 0:0:3:1: enclosure level(0x0000), connector name( C3 ) [ 10.825802] scsi 0:0:3:1: serial_number(021825001369 ) [ 10.831202] scsi 0:0:3:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.840391] scsi 0:0:3:1: Mode parameters changed [ 10.856525] scsi 0:0:3:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 [ 10.864788] scsi 0:0:3:31: SSP: handle(0x0014), sas_addr(0x500a0984dfa1fa20), phy(0), device_name(0x500a0984dfa1fa20) [ 10.875383] scsi 0:0:3:31: enclosure logical id(0x500605b00db90c00), slot(13) [ 10.882604] scsi 0:0:3:31: enclosure level(0x0000), connector name( C3 ) [ 10.889407] scsi 0:0:3:31: serial_number(021825001369 ) [ 10.894896] scsi 0:0:3:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 10.915040] mpt3sas_cm0: detecting: handle(0x0015), sas_address(0x500a0984dfa20c14), phy(4) [ 10.923395] mpt3sas_cm0: REPORT_LUNS: handle(0x0015), retries(0) [ 10.929543] mpt3sas_cm0: REPORT_LUNS: handle(0x0015), retries(1) [ 10.936542] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0015), lun(0) [ 10.942845] mpt3sas_cm0: detecting: handle(0x0015), sas_address(0x500a0984dfa20c14), phy(4) [ 10.951210] mpt3sas_cm0: REPORT_LUNS: handle(0x0015), retries(0) [ 10.957860] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0015), lun(0) [ 10.964430] scsi 0:0:4:0: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 10.972631] scsi 0:0:4:0: SSP: handle(0x0015), sas_addr(0x500a0984dfa20c14), phy(4), device_name(0x500a0984dfa20c14) [ 10.983139] scsi 0:0:4:0: enclosure logical id(0x500605b00db90c00), slot(9) [ 10.990185] scsi 0:0:4:0: enclosure level(0x0000), connector name( C2 ) [ 10.996904] scsi 0:0:4:0: serial_number(021825001558 ) [ 11.002304] scsi 0:0:4:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 11.025163] scsi 0:0:4:1: Direct-Access DELL MD34xx 0825 PQ: 0 ANSI: 5 [ 11.033333] scsi 0:0:4:1: SSP: handle(0x0015), sas_addr(0x500a0984dfa20c14), phy(4), device_name(0x500a0984dfa20c14) [ 11.043851] scsi 0:0:4:1: enclosure logical id(0x500605b00db90c00), slot(9) [ 11.050898] scsi 0:0:4:1: enclosure level(0x0000), connector name( C2 ) [ 11.057602] scsi 0:0:4:1: serial_number(021825001558 ) [ 11.063008] scsi 0:0:4:1: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 11.085544] scsi 0:0:4:31: Direct-Access DELL Universal Xport 0825 PQ: 0 ANSI: 5 [ 11.093799] scsi 0:0:4:31: SSP: handle(0x0015), sas_addr(0x500a0984dfa20c14), phy(4), device_name(0x500a0984dfa20c14) [ 11.104398] scsi 0:0:4:31: enclosure logical id(0x500605b00db90c00), slot(9) [ 11.111529] scsi 0:0:4:31: enclosure level(0x0000), connector name( C2 ) [ 11.118336] scsi 0:0:4:31: serial_number(021825001558 ) [ 11.123823] scsi 0:0:4:31: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 15.751438] mpt3sas_cm0: port enable: SUCCESS [ 15.756368] scsi 0:0:1:0: rdac: LUN 0 (IOSHIP) (owned) [ 15.761777] sd 0:0:1:0: [sda] 926167040 512-byte logical blocks: (474 GB/441 GiB) [ 15.769283] sd 0:0:1:0: [sda] 4096-byte physical blocks [ 15.774612] scsi 0:0:1:1: rdac: LUN 1 (IOSHIP) (unowned) [ 15.780213] sd 0:0:1:0: [sda] Write Protect is off [ 15.785031] sd 0:0:1:0: [sda] Mode Sense: 83 00 10 08 [ 15.785034] sd 0:0:1:1: [sdb] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 15.785226] sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 15.785262] scsi 0:0:1:2: rdac: LUN 2 (IOSHIP) (owned) [ 15.785550] sd 0:0:1:2: [sdc] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 15.786072] sd 0:0:1:2: [sdc] Write Protect is off [ 15.786074] sd 0:0:1:2: [sdc] Mode Sense: 83 00 10 08 [ 15.786137] scsi 0:0:2:0: rdac: LUN 0 (IOSHIP) (unowned) [ 15.786258] sd 0:0:1:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 15.786368] sd 0:0:2:0: [sdd] 926167040 512-byte logical blocks: (474 GB/441 GiB) [ 15.786370] sd 0:0:2:0: [sdd] 4096-byte physical blocks [ 15.786682] scsi 0:0:2:1: rdac: LUN 1 (IOSHIP) (owned) [ 15.786959] sd 0:0:2:0: [sdd] Write Protect is off [ 15.786960] sd 0:0:2:0: [sdd] Mode Sense: 83 00 10 08 [ 15.787050] sd 0:0:2:1: [sde] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 15.787234] sd 0:0:2:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 15.787360] scsi 0:0:2:2: rdac: LUN 2 (IOSHIP) (unowned) [ 15.787709] sd 0:0:2:2: [sdf] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 15.787824] sd 0:0:2:1: [sde] Write Protect is off [ 15.787826] sd 0:0:2:1: [sde] Mode Sense: 83 00 10 08 [ 15.788026] scsi 0:0:3:0: rdac: LUN 0 (IOSHIP) (owned) [ 15.788052] sd 0:0:2:1: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 15.788272] sd 0:0:1:0: [sda] Attached SCSI disk [ 15.788278] sd 0:0:3:0: [sdg] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 15.788450] sd 0:0:2:2: [sdf] Write Protect is off [ 15.788451] sd 0:0:2:2: [sdf] Mode Sense: 83 00 10 08 [ 15.788641] scsi 0:0:3:1: rdac: LUN 1 (IOSHIP) (unowned) [ 15.788719] sd 0:0:2:2: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 15.788817] sd 0:0:3:0: [sdg] Write Protect is off [ 15.788818] sd 0:0:3:0: [sdg] Mode Sense: 83 00 10 08 [ 15.788894] sd 0:0:3:1: [sdh] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 15.788919] sd 0:0:1:2: [sdc] Attached SCSI disk [ 15.789008] sd 0:0:3:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 15.789325] scsi 0:0:4:0: rdac: LUN 0 (IOSHIP) (unowned) [ 15.789527] sd 0:0:3:1: [sdh] Write Protect is off [ 15.789529] sd 0:0:3:1: [sdh] Mode Sense: 83 00 10 08 [ 15.789560] sd 0:0:4:0: [sdi] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 15.789732] sd 0:0:3:1: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 15.789967] scsi 0:0:4:1: rdac: LUN 1 (IOSHIP) (owned) [ 15.790208] sd 0:0:4:0: [sdi] Write Protect is off [ 15.790210] sd 0:0:4:0: [sdi] Mode Sense: 83 00 10 08 [ 15.790255] sd 0:0:4:1: [sdj] 37449707520 512-byte logical blocks: (19.1 TB/17.4 TiB) [ 15.790554] sd 0:0:4:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 15.790583] sd 1:2:0:0: [sdk] 233308160 512-byte logical blocks: (119 GB/111 GiB) [ 15.790755] sd 1:2:0:0: [sdk] Write Protect is off [ 15.790757] sd 1:2:0:0: [sdk] Mode Sense: 1f 00 10 08 [ 15.790789] sd 1:2:0:0: [sdk] Write cache: disabled, read cache: disabled, supports DPO and FUA [ 15.790827] sd 0:0:4:1: [sdj] Write Protect is off [ 15.790829] sd 0:0:4:1: [sdj] Mode Sense: 83 00 10 08 [ 15.791008] sd 0:0:4:1: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 15.791697] sd 0:0:2:1: [sde] Attached SCSI disk [ 15.791944] sd 0:0:2:0: [sdd] Attached SCSI disk [ 15.792372] sdk: sdk1 sdk2 sdk3 [ 15.792690] sd 1:2:0:0: [sdk] Attached SCSI disk [ 15.792786] sd 0:0:2:2: [sdf] Attached SCSI disk [ 15.793730] sd 0:0:3:0: [sdg] Attached SCSI disk [ 15.794063] sd 0:0:3:1: [sdh] Attached SCSI disk [ 15.794352] sd 0:0:4:1: [sdj] Attached SCSI disk [ 15.794792] sd 0:0:4:0: [sdi] Attached SCSI disk [ 16.088018] sd 0:0:1:1: [sdb] Write Protect is off [ 16.092811] sd 0:0:1:1: [sdb] Mode Sense: 83 00 10 08 [ 16.092947] sd 0:0:1:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 16.104233] sd 0:0:1:1: [sdb] Attached SCSI disk [ 16.399647] EXT4-fs (sdk2): mounted filesystem with ordered data mode. Opts: (null) [ 16.624173] systemd-journald[351]: Received SIGTERM from PID 1 (systemd). [ 16.653442] SELinux: Disabled at runtime. [ 16.657831] SELinux: Unregistering netfilter hooks [ 16.701468] type=1404 audit(1549517317.201:2): selinux=0 auid=4294967295 ses=4294967295 [ 16.729277] ip_tables: (C) 2000-2006 Netfilter Core Team [ 16.735350] systemd[1]: Inserted module 'ip_tables' [ 16.817760] EXT4-fs (sdk2): re-mounted. Opts: (null) [ 16.818750] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 16.836047] knem 1.1.3.90mlnx1: initialized [ 16.848257] systemd-journald[7864]: Received request to flush runtime journal from PID 1 [ 16.939106] ACPI Error: No handler for Region [SYSI] (ffff899587740b40) [IPMI] (20130517/evregion-162) [ 16.951700] ACPI Error: Region IPMI (ID=7) has no handler (20130517/exfldio-305) [ 16.951728] ACPI Error: Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff8968a9e7b5a0), AE_NOT_EXIST (20130517/psparse-536) [ 16.951738] ACPI Error: Method parse/execution failed [\_SB_.PMI0._PMC] (Node ffff8968a9e7b500), AE_NOT_EXIST (20130517/psparse-536) [ 16.951746] ACPI Exception: AE_NOT_EXIST, Evaluating _PMC (20130517/power_meter-753) [ 17.000757] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0 [ 17.009185] piix4_smbus 0000:00:14.0: Using register 0x2e for SMBus port selection [ 17.009835] ipmi message handler version 39.2 [ 17.024553] ccp 0000:02:00.2: 3 command queues available [ 17.031494] ccp 0000:02:00.2: irq 238 for MSI/MSI-X [ 17.031526] ccp 0000:02:00.2: irq 239 for MSI/MSI-X [ 17.031580] ccp 0000:02:00.2: Queue 2 can access 4 LSB regions [ 17.037847] ccp 0000:02:00.2: Queue 3 can access 4 LSB regions [ 17.045065] ccp 0000:02:00.2: Queue 4 can access 4 LSB regions [ 17.052283] ccp 0000:02:00.2: Queue 0 gets LSB 4 [ 17.058286] ccp 0000:02:00.2: Queue 1 gets LSB 5 [ 17.064292] ccp 0000:02:00.2: Queue 2 gets LSB 6 [ 17.070861] ipmi device interface [ 17.071334] scsi 0:0:0:0: Attached scsi generic sg0 type 13 [ 17.071632] sd 0:0:1:0: Attached scsi generic sg1 type 0 [ 17.071705] sd 0:0:1:1: Attached scsi generic sg2 type 0 [ 17.071763] sd 0:0:1:2: Attached scsi generic sg3 type 0 [ 17.071807] scsi 0:0:1:31: Attached scsi generic sg4 type 0 [ 17.071866] sd 0:0:2:0: Attached scsi generic sg5 type 0 [ 17.071923] sd 0:0:2:1: Attached scsi generic sg6 type 0 [ 17.071983] sd 0:0:2:2: Attached scsi generic sg7 type 0 [ 17.072035] scsi 0:0:2:31: Attached scsi generic sg8 type 0 [ 17.072081] sd 0:0:3:0: Attached scsi generic sg9 type 0 [ 17.072131] sd 0:0:3:1: Attached scsi generic sg10 type 0 [ 17.072181] scsi 0:0:3:31: Attached scsi generic sg11 type 0 [ 17.072226] sd 0:0:4:0: Attached scsi generic sg12 type 0 [ 17.072271] sd 0:0:4:1: Attached scsi generic sg13 type 0 [ 17.072793] scsi 0:0:4:31: Attached scsi generic sg14 type 0 [ 17.072854] sd 1:2:0:0: Attached scsi generic sg15 type 0 [ 17.077125] ccp 0000:02:00.2: enabled [ 17.077636] ccp 0000:03:00.1: 5 command queues available [ 17.077702] ccp 0000:03:00.1: irq 241 for MSI/MSI-X [ 17.077738] ccp 0000:03:00.1: Queue 0 can access 7 LSB regions [ 17.077740] ccp 0000:03:00.1: Queue 1 can access 7 LSB regions [ 17.077742] ccp 0000:03:00.1: Queue 2 can access 7 LSB regions [ 17.077744] ccp 0000:03:00.1: Queue 3 can access 7 LSB regions [ 17.077746] ccp 0000:03:00.1: Queue 4 can access 7 LSB regions [ 17.077747] ccp 0000:03:00.1: Queue 0 gets LSB 1 [ 17.077748] ccp 0000:03:00.1: Queue 1 gets LSB 2 [ 17.077749] ccp 0000:03:00.1: Queue 2 gets LSB 3 [ 17.077750] ccp 0000:03:00.1: Queue 3 gets LSB 4 [ 17.077751] ccp 0000:03:00.1: Queue 4 gets LSB 5 [ 17.097190] ccp 0000:03:00.1: enabled [ 17.097757] ccp 0000:41:00.2: 3 command queues available [ 17.097802] ccp 0000:41:00.2: irq 243 for MSI/MSI-X [ 17.097822] ccp 0000:41:00.2: irq 244 for MSI/MSI-X [ 17.097873] ccp 0000:41:00.2: Queue 2 can access 4 LSB regions [ 17.097875] ccp 0000:41:00.2: Queue 3 can access 4 LSB regions [ 17.097876] ccp 0000:41:00.2: Queue 4 can access 4 LSB regions [ 17.097878] ccp 0000:41:00.2: Queue 0 gets LSB 4 [ 17.097879] ccp 0000:41:00.2: Queue 1 gets LSB 5 [ 17.097880] ccp 0000:41:00.2: Queue 2 gets LSB 6 [ 17.109794] ccp 0000:41:00.2: enabled [ 17.109987] ccp 0000:42:00.1: 5 command queues available [ 17.110039] ccp 0000:42:00.1: irq 246 for MSI/MSI-X [ 17.110071] ccp 0000:42:00.1: Queue 0 can access 7 LSB regions [ 17.110073] ccp 0000:42:00.1: Queue 1 can access 7 LSB regions [ 17.110074] ccp 0000:42:00.1: Queue 2 can access 7 LSB regions [ 17.110076] ccp 0000:42:00.1: Queue 3 can access 7 LSB regions [ 17.110078] ccp 0000:42:00.1: Queue 4 can access 7 LSB regions [ 17.110080] ccp 0000:42:00.1: Queue 0 gets LSB 1 [ 17.110081] ccp 0000:42:00.1: Queue 1 gets LSB 2 [ 17.110082] ccp 0000:42:00.1: Queue 2 gets LSB 3 [ 17.110083] ccp 0000:42:00.1: Queue 3 gets LSB 4 [ 17.110084] ccp 0000:42:00.1: Queue 4 gets LSB 5 [ 17.114508] ccp 0000:42:00.1: enabled [ 17.114814] ccp 0000:85:00.2: 3 command queues available [ 17.114868] ccp 0000:85:00.2: irq 248 for MSI/MSI-X [ 17.114891] ccp 0000:85:00.2: irq 249 for MSI/MSI-X [ 17.114945] ccp 0000:85:00.2: Queue 2 can access 4 LSB regions [ 17.114946] ccp 0000:85:00.2: Queue 3 can access 4 LSB regions [ 17.114948] ccp 0000:85:00.2: Queue 4 can access 4 LSB regions [ 17.114950] ccp 0000:85:00.2: Queue 0 gets LSB 4 [ 17.114951] ccp 0000:85:00.2: Queue 1 gets LSB 5 [ 17.114952] ccp 0000:85:00.2: Queue 2 gets LSB 6 [ 17.123485] ccp 0000:85:00.2: enabled [ 17.123751] ccp 0000:86:00.1: 5 command queues available [ 17.123794] ccp 0000:86:00.1: irq 251 for MSI/MSI-X [ 17.123825] ccp 0000:86:00.1: Queue 0 can access 7 LSB regions [ 17.123827] ccp 0000:86:00.1: Queue 1 can access 7 LSB regions [ 17.123828] ccp 0000:86:00.1: Queue 2 can access 7 LSB regions [ 17.123830] ccp 0000:86:00.1: Queue 3 can access 7 LSB regions [ 17.123832] ccp 0000:86:00.1: Queue 4 can access 7 LSB regions [ 17.123834] ccp 0000:86:00.1: Queue 0 gets LSB 1 [ 17.123835] ccp 0000:86:00.1: Queue 1 gets LSB 2 [ 17.123836] ccp 0000:86:00.1: Queue 2 gets LSB 3 [ 17.123837] ccp 0000:86:00.1: Queue 3 gets LSB 4 [ 17.123838] ccp 0000:86:00.1: Queue 4 gets LSB 5 [ 17.130732] ccp 0000:86:00.1: enabled [ 17.131017] ccp 0000:c2:00.2: 3 command queues available [ 17.131065] ccp 0000:c2:00.2: irq 253 for MSI/MSI-X [ 17.131089] ccp 0000:c2:00.2: irq 254 for MSI/MSI-X [ 17.131135] ccp 0000:c2:00.2: Queue 2 can access 4 LSB regions [ 17.131137] ccp 0000:c2:00.2: Queue 3 can access 4 LSB regions [ 17.131139] ccp 0000:c2:00.2: Queue 4 can access 4 LSB regions [ 17.131141] ccp 0000:c2:00.2: Queue 0 gets LSB 4 [ 17.131142] ccp 0000:c2:00.2: Queue 1 gets LSB 5 [ 17.131143] ccp 0000:c2:00.2: Queue 2 gets LSB 6 [ 17.176678] ccp 0000:c2:00.2: enabled [ 17.178043] ccp 0000:c3:00.1: 5 command queues available [ 17.178090] ccp 0000:c3:00.1: irq 256 for MSI/MSI-X [ 17.178119] ccp 0000:c3:00.1: Queue 0 can access 7 LSB regions [ 17.178121] ccp 0000:c3:00.1: Queue 1 can access 7 LSB regions [ 17.178123] ccp 0000:c3:00.1: Queue 2 can access 7 LSB regions [ 17.178125] ccp 0000:c3:00.1: Queue 3 can access 7 LSB regions [ 17.178127] ccp 0000:c3:00.1: Queue 4 can access 7 LSB regions [ 17.178128] ccp 0000:c3:00.1: Queue 0 gets LSB 1 [ 17.178129] ccp 0000:c3:00.1: Queue 1 gets LSB 2 [ 17.178130] ccp 0000:c3:00.1: Queue 2 gets LSB 3 [ 17.178131] ccp 0000:c3:00.1: Queue 3 gets LSB 4 [ 17.178132] ccp 0000:c3:00.1: Queue 4 gets LSB 5 [ 17.210393] device-mapper: uevent: version 1.0.3 [ 17.214227] input: PC Speaker as /devices/platform/pcspkr/input/input2 [ 17.216070] device-mapper: ioctl: 4.37.1-ioctl (2018-04-03) initialised: dm-devel@redhat.com [ 17.225268] ccp 0000:c3:00.1: enabled [ 17.453102] IPMI System Interface driver. [ 17.453126] ipmi_si ipmi_si.0: ipmi_platform: probing via SMBIOS [ 17.453129] ipmi_si: SMBIOS: io 0xca8 regsize 1 spacing 4 irq 10 [ 17.453130] ipmi_si: Adding SMBIOS-specified kcs state machine [ 17.453161] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI [ 17.453190] ipmi_si IPI0001:00: [io 0x0ca8] regsize 1 spacing 4 irq 10 [ 17.453192] ipmi_si ipmi_si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI [ 17.453193] ipmi_si: Adding ACPI-specified kcs state machine [ 17.453289] ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca8, slave address 0x20, irq 10 [ 17.482399] ipmi_si IPI0001:00: The BMC does not support setting the recv irq bit, compensating, but the BMC needs to be fixed. [ 17.483623] ipmi_si IPI0001:00: Using irq 10 [ 17.485661] ipmi_si IPI0001:00: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20) [ 17.497014] mpt3sas_cm0: log_info(0x31200205): originator(PL), code(0x20), sub_code(0x0205) [ 17.497241] sd 0:0:1:0: Embedded Enclosure Device [ 17.504492] sd 0:0:1:1: Embedded Enclosure Device [ 17.516245] sd 0:0:1:2: Embedded Enclosure Device [ 17.530169] scsi 0:0:1:31: Embedded Enclosure Device [ 17.538111] sd 0:0:2:0: Embedded Enclosure Device [ 17.543887] sd 0:0:2:1: Embedded Enclosure Device [ 17.550537] sd 0:0:2:2: Embedded Enclosure Device [ 17.558927] scsi 0:0:2:31: Embedded Enclosure Device [ 17.565625] ipmi_si IPI0001:00: IPMI kcs interface initialized [ 17.567209] sd 0:0:3:0: Embedded Enclosure Device [ 17.922070] cryptd: max_cpu_qlen set to 1000 [ 17.949730] sd 0:0:3:1: Embedded Enclosure Device [ 17.959587] scsi 0:0:3:31: Embedded Enclosure Device [ 17.965469] AVX2 version of gcm_enc/dec engaged. [ 17.969780] sd 0:0:4:0: Embedded Enclosure Device [ 17.972173] sd 0:0:4:1: Embedded Enclosure Device [ 17.974372] scsi 0:0:4:31: Embedded Enclosure Device [ 17.977293] ses 0:0:0:0: Attached Enclosure device [ 17.984520] AES CTR mode by8 optimization enabled [ 18.008762] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni) [ 18.015826] alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni) [ 18.104743] kvm: Nested Paging enabled [ 18.117587] MCE: In-kernel MCE decoding enabled. [ 18.128918] AMD64 EDAC driver v3.4.0 [ 18.132527] EDAC amd64: DRAM ECC enabled. [ 18.136550] EDAC amd64: F17h detected (node 0). [ 18.141154] EDAC MC: UMC0 chip selects: [ 18.141157] EDAC amd64: MC: 0: 0MB 1: 0MB [ 18.145876] EDAC amd64: MC: 2: 32767MB 3: 32767MB [ 18.150590] EDAC amd64: MC: 4: 0MB 5: 0MB [ 18.155302] EDAC amd64: MC: 6: 0MB 7: 0MB [ 18.160032] EDAC MC: UMC1 chip selects: [ 18.160035] EDAC amd64: MC: 0: 0MB 1: 0MB [ 18.164751] EDAC amd64: MC: 2: 32767MB 3: 32767MB [ 18.169463] EDAC amd64: MC: 4: 0MB 5: 0MB [ 18.174180] EDAC amd64: MC: 6: 0MB 7: 0MB [ 18.178894] EDAC amd64: using x8 syndromes. [ 18.183093] EDAC amd64: MCT channel count: 2 [ 18.187589] EDAC MC0: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:18.3 [ 18.194995] EDAC amd64: DRAM ECC enabled. [ 18.200514] EDAC amd64: F17h detected (node 1). [ 18.205349] EDAC MC: UMC0 chip selects: [ 18.205353] EDAC amd64: MC: 0: 0MB 1: 0MB [ 18.210079] EDAC amd64: MC: 2: 32767MB 3: 32767MB [ 18.214808] EDAC amd64: MC: 4: 0MB 5: 0MB [ 18.219527] EDAC amd64: MC: 6: 0MB 7: 0MB [ 18.224243] EDAC MC: UMC1 chip selects: [ 18.224247] EDAC amd64: MC: 0: 0MB 1: 0MB [ 18.228970] EDAC amd64: MC: 2: 32767MB 3: 32767MB [ 18.233692] EDAC amd64: MC: 4: 0MB 5: 0MB [ 18.238412] EDAC amd64: MC: 6: 0MB 7: 0MB [ 18.243124] EDAC amd64: using x8 syndromes. [ 18.247323] EDAC amd64: MCT channel count: 2 [ 18.251940] EDAC MC1: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:19.3 [ 18.259550] EDAC amd64: DRAM ECC enabled. [ 18.263569] EDAC amd64: F17h detected (node 2). [ 18.268168] EDAC MC: UMC0 chip selects: [ 18.268172] EDAC amd64: MC: 0: 0MB 1: 0MB [ 18.272888] EDAC amd64: MC: 2: 32767MB 3: 32767MB [ 18.277603] EDAC amd64: MC: 4: 0MB 5: 0MB [ 18.282321] EDAC amd64: MC: 6: 0MB 7: 0MB [ 18.287047] EDAC MC: UMC1 chip selects: [ 18.287050] EDAC amd64: MC: 0: 0MB 1: 0MB [ 18.291767] EDAC amd64: MC: 2: 32767MB 3: 32767MB [ 18.296487] EDAC amd64: MC: 4: 0MB 5: 0MB [ 18.301203] EDAC amd64: MC: 6: 0MB 7: 0MB [ 18.305918] EDAC amd64: using x8 syndromes. [ 18.310113] EDAC amd64: MCT channel count: 2 [ 18.314581] EDAC MC2: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:1a.3 [ 18.321994] EDAC amd64: DRAM ECC enabled. [ 18.326105] EDAC amd64: F17h detected (node 3). [ 18.330769] EDAC MC: UMC0 chip selects: [ 18.330772] EDAC amd64: MC: 0: 0MB 1: 0MB [ 18.335491] EDAC amd64: MC: 2: 32767MB 3: 32767MB [ 18.340216] EDAC amd64: MC: 4: 0MB 5: 0MB [ 18.344935] EDAC amd64: MC: 6: 0MB 7: 0MB [ 18.349794] EDAC MC: UMC1 chip selects: [ 18.349798] EDAC amd64: MC: 0: 0MB 1: 0MB [ 18.354515] EDAC amd64: MC: 2: 32767MB 3: 32767MB [ 18.359226] EDAC amd64: MC: 4: 0MB 5: 0MB [ 18.363942] EDAC amd64: MC: 6: 0MB 7: 0MB [ 18.368659] EDAC amd64: using x8 syndromes. [ 18.372863] EDAC amd64: MCT channel count: 2 [ 18.377378] EDAC MC3: Giving out device to 'amd64_edac' 'F17h': DEV 0000:00:1b.3 [ 18.377543] dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.3) [ 18.392386] EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI controller': DEV '0000:00:18.0' (POLLED) [ 47.313475] device-mapper: multipath round-robin: version 1.2.0 loaded [ 57.427902] Adding 4194300k swap on /dev/sdk3. Priority:-2 extents:1 across:4194300k FS [ 57.441413] FAT-fs (sdk1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. [ 57.480588] type=1305 audit(1549517357.979:3): audit_pid=17693 old=0 auid=4294967295 ses=4294967295 res=1 [ 57.500939] RPC: Registered named UNIX socket transport module. [ 57.508181] RPC: Registered udp transport module. [ 57.514273] RPC: Registered tcp transport module. [ 57.520364] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 57.822192] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 57.900328] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 57.955918] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 58.083471] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 58.131642] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 58.214813] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 58.228871] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 58.245098] mlx5_core 0000:84:00.0: slow_pci_heuristic:5202:(pid 18072): Max link speed = 100000, PCI BW = 126016 [ 58.255431] mlx5_core 0000:84:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(64) RxCqeCmprss(0) [ 58.263622] mlx5_core 0000:84:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(64) RxCqeCmprss(0) [ 58.399005] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 58.411816] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 58.461318] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 58.510177] Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 [ 58.779449] tg3 0000:81:00.0: irq 257 for MSI/MSI-X [ 58.779472] tg3 0000:81:00.0: irq 258 for MSI/MSI-X [ 58.779483] tg3 0000:81:00.0: irq 259 for MSI/MSI-X [ 58.779500] tg3 0000:81:00.0: irq 260 for MSI/MSI-X [ 58.779510] tg3 0000:81:00.0: irq 261 for MSI/MSI-X [ 58.905595] IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready [ 62.419248] tg3 0000:81:00.0 em1: Link is up at 1000 Mbps, full duplex [ 62.425782] tg3 0000:81:00.0 em1: Flow control is on for TX and on for RX [ 62.432572] tg3 0000:81:00.0 em1: EEE is enabled [ 62.437207] IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready [ 63.142934] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready [ 63.254801] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready [ 67.392034] FS-Cache: Loaded [ 67.421726] FS-Cache: Netfs 'nfs' registered for caching [ 67.430915] Key type dns_resolver registered [ 67.458995] NFS: Registering the id_resolver key type [ 67.464251] Key type id_resolver registered [ 67.469807] Key type id_legacy registered [ 377.138189] LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 [ 377.146009] alg: No test for adler32 (adler32-zlib) [ 377.947572] Lustre: Lustre: Build Version: 2.12.0 [ 378.068951] LNet: Using FastReg for registration [ 378.069993] LNetError: 7271:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.203@o2ib7 on NA (ib0:0:10.0.10.51): bad dst nid 10.0.10.51@o2ib7 [ 378.101145] LNet: Added LNI 10.0.10.51@o2ib7 [8/256/0/180] [ 381.233070] LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [ 381.549622] Lustre: MGS: Connection restored to 5f0406ba-a79c-d074-97a6-2ac37d10730b (at 0@lo) [ 383.678122] Lustre: MGS: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [ 392.859111] Lustre: MGS: Connection restored to 8744285d-b473-02b9-1c71-8a6310ad2b36 (at 10.0.10.101@o2ib7) [ 401.337160] Lustre: MGS: Connection restored to a6c5c1e2-0a91-b666-a166-90a251a0a891 (at 10.8.17.7@o2ib6) [ 405.869736] Lustre: MGS: Connection restored to e7be8b4b-1b39-c49a-d247-f70599f17ba0 (at 10.0.10.104@o2ib7) [ 405.879475] Lustre: Skipped 1 previous similar message [ 415.435803] Lustre: MGS: Connection restored to 4c0a6847-7a14-1c98-ab37-7777b07c3f81 (at 10.9.0.64@o2ib4) [ 415.445372] Lustre: Skipped 8 previous similar messages [ 431.466312] Lustre: MGS: Connection restored to 85591248-36ae-00b4-33b4-75872f10b588 (at 10.8.30.36@o2ib6) [ 431.475964] Lustre: Skipped 262 previous similar messages [ 463.508329] Lustre: MGS: Connection restored to 7d55a611-6993-ed8f-69a9-2367782a7615 (at 10.9.104.56@o2ib4) [ 463.518072] Lustre: Skipped 1061 previous similar messages [ 488.661912] LDISKFS-fs warning (device dm-4): ldiskfs_multi_mount_protect:321: MMP interval 42 higher than expected, please wait. [ 488.664139] LDISKFS-fs warning (device dm-0): ldiskfs_multi_mount_protect:321: MMP interval 42 higher than expected, please wait. [ 530.664608] LDISKFS-fs (dm-0): file extents enabled, maximum tree depth=5 [ 530.688592] LDISKFS-fs (dm-4): file extents enabled, maximum tree depth=5 [ 533.007299] LDISKFS-fs (dm-4): recovery complete [ 533.012121] LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [ 533.415191] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.20.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 533.476856] Lustre: fir-MDT0000: Not available for connect from 10.8.2.1@o2ib6 (not set up) [ 533.485217] Lustre: Skipped 1 previous similar message [ 533.611895] LustreError: 11-0: fir-MDT0001-osp-MDT0000: operation mds_connect to node 10.0.10.52@o2ib7 failed: rc = -114 [ 533.679871] Lustre: fir-MDT0000: Imperative Recovery not enabled, recovery window 300-900 [ 533.697314] Lustre: fir-MDD0000: changelog on [ 533.748340] Lustre: fir-MDT0000: Will be in recovery for at least 5:00, or until 1332 clients reconnect [ 533.870012] LDISKFS-fs (dm-0): recovery complete [ 533.874835] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [ 533.927898] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.22.35@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 533.945179] LustreError: Skipped 20 previous similar messages [ 534.258833] Lustre: fir-MDT0002: Not available for connect from 10.9.107.52@o2ib4 (not set up) [ 534.267454] Lustre: Skipped 1 previous similar message [ 534.451011] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_connect to node 0@lo failed: rc = -114 [ 534.460837] LustreError: Skipped 49 previous similar messages [ 534.506443] Lustre: fir-MDT0000: Connection restored to 10.0.10.51@o2ib7 (at 0@lo) [ 534.507482] Lustre: fir-MDT0002: Imperative Recovery not enabled, recovery window 300-900 [ 534.511678] Lustre: fir-MDD0002: changelog on [ 534.526555] Lustre: Skipped 2 previous similar messages [ 534.596517] Lustre: fir-MDT0002: Will be in recovery for at least 5:00, or until 1332 clients reconnect [ 541.463504] Lustre: fir-MDT0000: Denying connection for new client 85e8a876-efff-2831-db66-61f0415adaa0(at 10.8.21.19@o2ib6), waiting for 1332 known clients (372 recovered, 1 in progress, and 0 evicted) to recover in 9:30 [ 541.483122] Lustre: Skipped 1 previous similar message [ 559.540296] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [ 559.540652] LustreError: 11-0: fir-OST000e-osc-MDT0002: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 [ 559.540654] LustreError: Skipped 98 previous similar messages [ 559.573156] LustreError: Skipped 11 previous similar messages [ 570.001885] Lustre: fir-MDT0002: Denying connection for new client 85e8a876-efff-2831-db66-61f0415adaa0(at 10.8.21.19@o2ib6), waiting for 1334 known clients (893 recovered, 0 in progress, and 0 evicted) to recover in 9:01 [ 570.021506] Lustre: Skipped 1 previous similar message [ 584.629179] LustreError: 11-0: fir-OST000e-osc-MDT0002: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 [ 584.640066] LustreError: Skipped 47 previous similar messages [ 589.437099] Lustre: fir-MDT0002: Recovery over after 0:54, of 1334 clients 1334 recovered and 0 were evicted. [ 609.717658] LustreError: 11-0: fir-OST000c-osc-MDT0002: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 [ 609.728532] LustreError: Skipped 47 previous similar messages [ 634.806304] LustreError: 11-0: fir-OST000c-osc-MDT0002: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 [ 634.817175] LustreError: Skipped 47 previous similar messages [ 659.894911] LustreError: 11-0: fir-OST000d-osc-MDT0002: operation ost_connect to node 10.0.10.104@o2ib7 failed: rc = -16 [ 659.905790] LustreError: Skipped 47 previous similar messages [ 710.060092] Lustre: fir-OST000a-osc-MDT0000: Connection restored to 10.0.10.101@o2ib7 (at 10.0.10.101@o2ib7) [ 710.069947] Lustre: Skipped 2770 previous similar messages [ 710.075883] LustreError: 11-0: fir-OST000c-osc-MDT0002: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 [ 710.086745] LustreError: Skipped 95 previous similar messages [ 738.857614] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.1.22@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8977f76e5100/0x223c295caf62cce1 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0xb3:0x0].0x0 bits 0x13/0x0 rrc: 424 type: IBT flags: 0x60200400000020 nid: 10.8.1.22@o2ib6 remote: 0xffaf7c067449203e expref: 8 pid: 21074 timeout: 738 lvb_type: 0 [ 738.913053] LustreError: 21448:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.21.19@o2ib6: deadline 100:44s ago req@ffff8967546f5400 x1624784665370704/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1549517995 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [ 738.921468] Lustre: 21523:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (100:44s); client may timeout. req@ffff8977e347bf00 x1624784665370736/t0(0) o38->@:0/0 lens 520/0 e 0 to 0 dl 1549517995 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [ 738.921504] Lustre: fir-MDT0002: Client 85e8a876-efff-2831-db66-61f0415adaa0 (at 10.8.21.19@o2ib6) reconnecting [ 738.952198] LustreError: 20673:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.107.68@o2ib4 arrived at 1549518039 with bad export cookie 2466892173948339025 [ 738.992075] LustreError: 21448:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message [ 745.622037] Lustre: fir-MDT0000: Client 85e8a876-efff-2831-db66-61f0415adaa0 (at 10.8.21.19@o2ib6) reconnecting [ 745.632124] Lustre: Skipped 2 previous similar messages [ 785.338963] LustreError: 11-0: fir-OST0007-osc-MDT0002: operation ost_connect to node 10.0.10.102@o2ib7 failed: rc = -16 [ 785.349835] LustreError: Skipped 77 previous similar messages [ 1089.984411] LustreError: 21530:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1549518090, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8977e4faa400/0x223c295caf9c9944 lrc: 3/0,1 mode: --/PW res: [0x2c0003357:0x2d0:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 21530 timeout: 0 lvb_type: 0 [ 1089.984461] LustreError: dumping log to /tmp/lustre-log.1549518390.21475 [ 1090.030531] LustreError: 21530:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message [ 1540.134760] Lustre: 20795:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply req@ffff8977e2c47200 x1624748261185216/t25769804206(0) o36->bcdbcaa2-b5a0-6ff6-1390-a90accf35015@10.9.106.20@o2ib4:635/0 lens 504/2888 e 0 to 0 dl 1549518845 ref 2 fl Interpret:/0/0 rc 0/0 [ 1547.045804] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [ 1547.051368] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [ 1547.051370] Lustre: Skipped 63 previous similar messages [ 1547.071540] Lustre: Skipped 1 previous similar message [ 1992.536055] LNet: Service thread pid 21475 was inactive for 1202.52s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 1992.553080] Pid: 21475, comm: mdt00_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [ 1992.562820] Call Trace: [ 1992.565286] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [ 1992.572237] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [ 1992.579454] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [ 1992.586285] [] mdt_object_lock_internal+0x70/0x3e0 [mdt] [ 1992.593304] [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] [ 1992.600221] [] mdt_getattr_name+0xc4/0x2b0 [mdt] [ 1992.606545] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [ 1992.613490] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ 1992.621218] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [ 1992.627548] [] kthread+0xd1/0xe0 [ 1992.632475] [] ret_from_fork_nospec_begin+0xe/0x21 [ 1992.638954] [] 0xffffffffffffffff [ 1992.643981] LustreError: dumping log to /tmp/lustre-log.1549519293.21475 [ 1993.699211] LNet: Service thread pid 21530 was inactive for 1203.69s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 1993.716249] Pid: 21530, comm: mdt01_063 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [ 1993.725996] Call Trace: [ 1993.728460] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [ 1993.735407] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [ 1993.742602] [] mdt_dom_discard_data+0x101/0x130 [mdt] [ 1993.749370] [] mdt_reint_unlink+0x331/0x14a0 [mdt] [ 1993.755854] [] mdt_reint_rec+0x83/0x210 [mdt] [ 1993.761904] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [ 1993.768473] [] mdt_reint+0x67/0x140 [mdt] [ 1993.774166] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [ 1993.781119] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ 1993.788857] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [ 1993.795193] [] kthread+0xd1/0xe0 [ 1993.800108] [] ret_from_fork_nospec_begin+0xe/0x21 [ 1993.806580] [] 0xffffffffffffffff [ 2303.115878] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [ 2303.126079] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [ 2303.136517] Lustre: Skipped 1 previous similar message [ 3059.217654] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [ 3059.227834] Lustre: Skipped 1 previous similar message [ 3059.231510] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [ 3059.231512] Lustre: Skipped 1 previous similar message [ 3815.305186] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [ 3815.315214] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [ 3815.325482] Lustre: Skipped 5 previous similar messages [ 4571.385882] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [ 4571.396063] Lustre: Skipped 1 previous similar message [ 4571.401260] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [ 4571.411717] Lustre: Skipped 1 previous similar message [ 5327.477614] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [ 5327.487617] Lustre: Skipped 1 previous similar message [ 5327.491182] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [ 6083.555746] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [ 6083.565949] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [ 6083.576384] Lustre: Skipped 1 previous similar message [ 6839.650187] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [ 6839.656116] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [ 6839.670610] Lustre: Skipped 2 previous similar messages [ 7595.722653] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [ 7595.732855] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [ 7595.743292] Lustre: Skipped 1 previous similar message [ 8351.822226] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [ 8351.822757] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [ 8351.822759] Lustre: Skipped 1 previous similar message [ 8351.847802] Lustre: Skipped 2 previous similar messages [ 9107.896950] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [ 9107.906991] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [ 9107.917298] Lustre: Skipped 1 previous similar message [ 9863.989731] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [ 9863.996203] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [ 9863.996205] Lustre: Skipped 1 previous similar message [ 9864.015296] Lustre: Skipped 2 previous similar messages [10620.069422] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [10620.079453] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [10620.089721] Lustre: Skipped 1 previous similar message [11376.163353] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [11376.168691] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [11376.168694] Lustre: Skipped 1 previous similar message [11376.188917] Lustre: Skipped 2 previous similar messages [12132.241814] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [12132.251841] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [12132.262105] Lustre: Skipped 1 previous similar message [12888.332133] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [12888.341067] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [12888.341070] Lustre: Skipped 1 previous similar message [12888.357737] Lustre: Skipped 2 previous similar messages [13644.415224] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [13644.425264] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [13644.435525] Lustre: Skipped 1 previous similar message [14400.503200] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [14400.513382] Lustre: Skipped 1 previous similar message [14400.514553] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [15156.588775] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [15156.598799] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [15156.609062] Lustre: Skipped 2 previous similar messages [15338.885104] Lustre: fir-MDT0000: haven't heard from client f20c0003-c984-0982-12d5-e8f78b4db48d (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977e6b74800, cur 1549532639 expire 1549532489 last 1549532412 [15682.893685] Lustre: fir-MDT0000: haven't heard from client a37a74c8-e508-3e8f-25dc-1c7078a63130 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89667fa24000, cur 1549532983 expire 1549532833 last 1549532756 [15682.915393] Lustre: Skipped 2 previous similar messages [15912.667848] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [15912.678024] Lustre: Skipped 1 previous similar message [15912.683197] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [15912.693648] Lustre: Skipped 7 previous similar messages [15945.205205] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549533234/real 1549533234] req@ffff897757e46f00 x1624787063029344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549533245 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [15956.232484] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549533245/real 1549533245] req@ffff897757e46f00 x1624787063029344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549533256 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [15967.259761] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549533256/real 1549533256] req@ffff897757e46f00 x1624787063029344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549533267 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [15978.287036] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549533267/real 1549533267] req@ffff897757e46f00 x1624787063029344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549533278 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [15989.314315] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549533278/real 1549533278] req@ffff897757e46f00 x1624787063029344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549533289 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [16000.341589] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549533289/real 1549533289] req@ffff897757e46f00 x1624787063029344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549533300 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [16022.369143] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549533311/real 1549533311] req@ffff897757e46f00 x1624787063029344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549533322 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [16022.396391] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [16055.405970] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549533344/real 1549533344] req@ffff897757e46f00 x1624787063029344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549533355 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [16055.433238] Lustre: 21426:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [16088.443823] LustreError: 21426:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff897757e46f00 x1624787063029344 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8977a0e81b00/0x223c295d19bdca6a lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xe3817714210d7cc7 expref: 12 pid: 21573 timeout: 16226 lvb_type: 0 [16088.486510] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [16088.499051] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8977a0e81b00/0x223c295d19bdca6a lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xe3817714210d7cc7 expref: 13 pid: 21573 timeout: 0 lvb_type: 0 [16088.536583] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 10 previous similar messages [16142.906431] Lustre: MGS: haven't heard from client f2b07333-a142-0298-13a0-7c7df112f0eb (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997517d1400, cur 1549533443 expire 1549533293 last 1549533216 [16142.927443] Lustre: Skipped 2 previous similar messages [16484.884756] Lustre: 21548:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549533773/real 1549533773] req@ffff896754619200 x1624787063429616/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549533784 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [16484.912004] Lustre: 21548:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [16552.925271] Lustre: fir-MDT0000: haven't heard from client ce3b08aa-1f21-97fa-2fc0-1c5b83680208 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8986ee25f000, cur 1549533853 expire 1549533703 last 1549533626 [16552.946977] Lustre: Skipped 1 previous similar message [16619.668438] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [16619.678088] Lustre: Skipped 3 previous similar messages [16668.761495] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [16668.771498] Lustre: Skipped 1 previous similar message [16899.924163] Lustre: fir-MDT0000: haven't heard from client f20843e9-8b87-5b64-be62-e200335d8fff (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897641399c00, cur 1549534200 expire 1549534050 last 1549533973 [16899.945869] Lustre: Skipped 2 previous similar messages [17212.932020] Lustre: fir-MDT0000: haven't heard from client f480bbcd-aa9c-8d55-c7b1-f2ddaaba27ee (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89762de29400, cur 1549534513 expire 1549534363 last 1549534286 [17212.953723] Lustre: Skipped 2 previous similar messages [17284.347646] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [17284.357306] Lustre: Skipped 7 previous similar messages [17424.839521] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [17583.941404] Lustre: fir-MDT0000: haven't heard from client 7b70a8c4-b344-a80f-9360-6a2e17e4980d (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897621fbf800, cur 1549534884 expire 1549534734 last 1549534657 [17583.963107] Lustre: Skipped 2 previous similar messages [18180.924076] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [18180.925331] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [18180.925333] Lustre: Skipped 7 previous similar messages [18180.949742] Lustre: Skipped 2 previous similar messages [18936.998707] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [18937.008731] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [18937.018997] Lustre: Skipped 1 previous similar message [19222.982640] Lustre: fir-MDT0000: haven't heard from client 8d040535-0e79-ad29-dbbc-3b6976c4867e (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897617a69000, cur 1549536523 expire 1549536373 last 1549536296 [19223.004348] Lustre: Skipped 2 previous similar messages [19581.991473] Lustre: fir-MDT0000: haven't heard from client 78da3f25-f186-ac81-b2a1-346311672eab (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8986aa2f4c00, cur 1549536882 expire 1549536732 last 1549536655 [19582.013184] Lustre: Skipped 2 previous similar messages [19670.857134] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [19670.866786] Lustre: Skipped 4 previous similar messages [19693.095144] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [19693.105331] Lustre: Skipped 2 previous similar messages [19951.001005] Lustre: fir-MDT0000: haven't heard from client bddbd171-0f13-e864-77b6-6341af0c51ca (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975e27d4c00, cur 1549537251 expire 1549537101 last 1549537024 [19951.022706] Lustre: Skipped 2 previous similar messages [20320.010090] Lustre: fir-MDT0000: haven't heard from client 8bfa1314-e90a-f67f-85e8-84e2b31d0b2c (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896634af1800, cur 1549537620 expire 1549537470 last 1549537393 [20320.031801] Lustre: Skipped 2 previous similar messages [20404.781430] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [20404.791096] Lustre: Skipped 7 previous similar messages [20449.171681] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [20699.019517] Lustre: fir-MDT0000: haven't heard from client e08b54a7-025c-3526-6854-0251a2a49583 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975c664e800, cur 1549537999 expire 1549537849 last 1549537772 [20699.041218] Lustre: Skipped 2 previous similar messages [20897.095473] Lustre: 21519:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549538186/real 1549538186] req@ffff898745b6a700 x1624787066865344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549538197 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [20897.122732] Lustre: 21519:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [20919.133028] Lustre: 21519:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549538208/real 1549538208] req@ffff898745b6a700 x1624787066865344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549538219 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [20919.160302] Lustre: 21519:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [20952.169850] Lustre: 21519:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549538241/real 1549538241] req@ffff898745b6a700 x1624787066865344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549538252 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [20952.197103] Lustre: 21519:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [21018.208501] Lustre: 21519:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549538307/real 1549538307] req@ffff898745b6a700 x1624787066865344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549538318 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [21018.235750] Lustre: 21519:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [21040.246090] LustreError: 21519:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff898745b6a700 x1624787066865344 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8987b67fda00/0x223c295d3ca8c642 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 57 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x70eaf6220347ae3a expref: 12 pid: 21695 timeout: 21177 lvb_type: 0 [21040.288678] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [21040.301208] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8987b67fda00/0x223c295d3ca8c642 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 57 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x70eaf6220347ae3a expref: 13 pid: 21695 timeout: 0 lvb_type: 0 [21070.034731] Lustre: fir-MDT0002: haven't heard from client 600a714d-e74f-2562-4ff1-9128707c2ee9 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975cb626c00, cur 1549538370 expire 1549538220 last 1549538143 [21070.056440] Lustre: Skipped 2 previous similar messages [21094.850958] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [21094.860613] Lustre: Skipped 7 previous similar messages [21205.249720] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [21205.259901] Lustre: Skipped 2 previous similar messages [21400.037207] Lustre: fir-MDT0000: haven't heard from client 4ea3c1ff-65c9-2592-26b4-c4a98dc229a4 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8976773e2400, cur 1549538700 expire 1549538550 last 1549538473 [21400.058916] Lustre: Skipped 1 previous similar message [21602.204163] Lustre: 21557:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549538891/real 1549538891] req@ffff8986fdb5bc00 x1624787067338880/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549538902 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [21602.231433] Lustre: 21557:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [21744.048604] Lustre: fir-MDT0000: haven't heard from client 7fb21a83-f1e5-c002-0534-5852cdea3589 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975b829c000, cur 1549539044 expire 1549538894 last 1549538817 [21744.070324] Lustre: Skipped 2 previous similar messages [21793.535147] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [21793.544801] Lustre: Skipped 7 previous similar messages [21918.190110] Lustre: 21770:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549539207/real 1549539207] req@ffff898673f97500 x1624787067560240/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549539218 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [21918.217361] Lustre: 21770:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages [21961.328787] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [22061.230702] LustreError: 21770:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff898673f97500 x1624787067560240 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff89777c6a4800/0x223c295d41f590c1 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xb9eea92428d1bdf3 expref: 12 pid: 21718 timeout: 22198 lvb_type: 0 [22061.273383] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [22061.285915] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff89777c6a4800/0x223c295d41f590c1 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xb9eea92428d1bdf3 expref: 13 pid: 21718 timeout: 0 lvb_type: 0 [22631.069150] Lustre: fir-MDT0000: haven't heard from client adf9fc18-cbbb-2506-6b66-c717afcfcfea (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8986cdbcb800, cur 1549539931 expire 1549539781 last 1549539704 [22631.090851] Lustre: Skipped 4 previous similar messages [22717.409766] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [22717.412472] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [22717.412474] Lustre: Skipped 7 previous similar messages [22717.435422] Lustre: Skipped 2 previous similar messages [23031.055657] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [23470.186044] Lustre: 21067:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549540759/real 1549540759] req@ffff8976e9ad5700 x1624787068626464/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549540770 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [23470.213297] Lustre: 21067:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [23473.487162] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [23473.497184] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [23473.507443] Lustre: Skipped 4 previous similar messages [23536.224700] Lustre: 21067:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549540825/real 1549540825] req@ffff8976e9ad5700 x1624787068626464/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549540836 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [23536.251953] Lustre: 21067:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [23536.262039] LustreError: 21067:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) returned error from blocking AST (req@ffff8976e9ad5700 x1624787068626464 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8977e2f7f2c0/0x223c295d4ba3dc47 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 106 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x79f0b3ea555c3123 expref: 12 pid: 21661 timeout: 23684 lvb_type: 0 [23536.305083] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -107 [23536.317626] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 77s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8977e2f7f2c0/0x223c295d4ba3dc47 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 106 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x79f0b3ea555c3123 expref: 13 pid: 21661 timeout: 0 lvb_type: 0 [23549.094846] Lustre: MGS: haven't heard from client 9e1c285d-477a-8d2d-0cc8-77d0bb35fbc1 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997347e7c00, cur 1549540849 expire 1549540699 last 1549540622 [23549.115866] Lustre: Skipped 2 previous similar messages [23729.179545] Lustre: 21324:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549541018/real 1549541018] req@ffff898671653300 x1624787068830432/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549541029 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [24014.227698] Lustre: 21515:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549541303/real 1549541303] req@ffff8965e2645700 x1624787069046816/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549541314 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [24014.254949] Lustre: 21515:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [24157.268305] LustreError: 21515:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff8965e2645700 x1624787069046816 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8977dc2c21c0/0x223c295d4eff77da lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x6f69ed9e325aeed3 expref: 12 pid: 21614 timeout: 24294 lvb_type: 0 [24157.310988] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [24157.323532] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8977dc2c21c0/0x223c295d4eff77da lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x6f69ed9e325aeed3 expref: 13 pid: 21614 timeout: 0 lvb_type: 0 [24164.110221] Lustre: MGS: haven't heard from client 2c6eae34-a10c-907a-613a-0db91f4e468e (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89972262ac00, cur 1549541464 expire 1549541314 last 1549541237 [24164.131236] Lustre: Skipped 4 previous similar messages [24229.582855] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [24229.586977] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [24229.586980] Lustre: Skipped 7 previous similar messages [24229.608519] Lustre: Skipped 2 previous similar messages [24985.660796] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [24985.670827] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [24985.681091] Lustre: Skipped 7 previous similar messages [25741.759381] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [25741.760553] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [25741.760556] Lustre: Skipped 1 previous similar message [25741.784957] Lustre: Skipped 2 previous similar messages [26081.089554] Lustre: 21604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549543373/real 1549543373] req@ffff89759bb67b00 x1624787070569904/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549543380 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [26081.116807] Lustre: 21604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [26151.128307] Lustre: 21604:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549543443/real 1549543443] req@ffff89759bb67b00 x1624787070569904/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549543450 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [26151.155578] Lustre: 21604:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [26221.166946] Lustre: fir-MDT0002: haven't heard from client 910d5119-59ba-6900-0b0c-2e5796ac7b53 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975687a8800, cur 1549543521 expire 1549543371 last 1549543294 [26221.188651] Lustre: Skipped 4 previous similar messages [26497.834358] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [26497.844386] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [26497.854653] Lustre: Skipped 4 previous similar messages [26564.172241] Lustre: fir-MDT0002: haven't heard from client 8608a1f0-2a8e-f729-2ef3-a76ade04485a (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89865fb4e800, cur 1549543864 expire 1549543714 last 1549543637 [26564.193946] Lustre: Skipped 2 previous similar messages [26924.949727] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549544213/real 1549544213] req@ffff89752ee51800 x1624787071196912/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549544224 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [26924.976983] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages [26946.987281] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549544235/real 1549544235] req@ffff89752ee51800 x1624787071196912/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549544246 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [26947.014534] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [26980.024111] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549544268/real 1549544268] req@ffff89752ee51800 x1624787071196912/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549544279 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [26980.051358] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [27046.062763] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549544334/real 1549544334] req@ffff89752ee51800 x1624787071196912/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549544345 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [27046.090010] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [27062.186350] Lustre: fir-MDT0000: haven't heard from client db4864c8-555c-d477-7b5b-7776f406e20c (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89669721a800, cur 1549544362 expire 1549544212 last 1549544135 [27062.208095] Lustre: Skipped 2 previous similar messages [27102.671298] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [27102.680960] Lustre: Skipped 4 previous similar messages [27214.715011] Lustre: 21652:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549544503/real 1549544503] req@ffff89759e798000 x1624787071406800/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549544514 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [27214.742267] Lustre: 21652:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [27253.930838] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [27253.941013] Lustre: Skipped 2 previous similar messages [27357.754612] LustreError: 21652:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff89759e798000 x1624787071406800 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8977e1cf5c40/0x223c295d615978b1 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x59c31ca28cde19ec expref: 12 pid: 21719 timeout: 27495 lvb_type: 0 [27357.797292] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [27357.809826] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8977e1cf5c40/0x223c295d615978b1 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x59c31ca28cde19ec expref: 13 pid: 21719 timeout: 0 lvb_type: 0 [27382.187767] Lustre: fir-MDT0002: haven't heard from client 532e6a21-46fc-150e-8a3b-4cf1e4fa60a1 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975cf37dc00, cur 1549544682 expire 1549544532 last 1549544455 [27382.209465] Lustre: Skipped 2 previous similar messages [27988.674541] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [27988.684202] Lustre: Skipped 7 previous similar messages [28010.008858] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [28544.216366] Lustre: fir-MDT0000: haven't heard from client 8a617574-5adb-b585-6cc0-4971e045b67c (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89860abe5800, cur 1549545844 expire 1549545694 last 1549545617 [28544.238072] Lustre: Skipped 4 previous similar messages [28766.092639] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [28766.100507] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [28766.100509] Lustre: Skipped 7 previous similar messages [28766.118293] Lustre: Skipped 2 previous similar messages [29277.104754] Lustre: 21672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549546565/real 1549546565] req@ffff89751bae9800 x1624787072916720/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549546576 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [29277.132005] Lustre: 21672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [29310.142582] Lustre: 21672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549546598/real 1549546598] req@ffff89751bae9800 x1624787072916720/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549546609 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [29310.169830] Lustre: 21672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [29376.181239] Lustre: 21672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549546664/real 1549546664] req@ffff89751bae9800 x1624787072916720/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549546675 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [29376.208493] Lustre: 21672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [29420.219364] LustreError: 21672:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff89751bae9800 x1624787072916720 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8977e1a44c80/0x223c295d6d532d0f lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x47087558d670f0d4 expref: 12 pid: 21268 timeout: 29557 lvb_type: 0 [29420.262040] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [29420.274572] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8977e1a44c80/0x223c295d6d532d0f lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x47087558d670f0d4 expref: 13 pid: 21268 timeout: 0 lvb_type: 0 [29442.239513] Lustre: fir-MDT0002: haven't heard from client e3f4db5d-431b-6a68-664a-8ff476ffd4bb (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89751929ac00, cur 1549546742 expire 1549546592 last 1549546515 [29442.261219] Lustre: Skipped 2 previous similar messages [29494.761572] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [29494.771230] Lustre: Skipped 1 previous similar message [29522.166927] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [29972.059197] Lustre: 21571:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549547260/real 1549547260] req@ffff8974f6b7e900 x1624787073450736/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549547271 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [29972.086452] Lustre: 21571:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [30090.289261] Lustre: fir-MDT0000: haven't heard from client 30f8154d-ab3e-afdb-9c5f-57a683ce3771 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898611f1b000, cur 1549547390 expire 1549547240 last 1549547163 [30090.310962] Lustre: Skipped 7 previous similar messages [30157.922916] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [30157.932573] Lustre: Skipped 10 previous similar messages [30278.253969] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [30278.264145] Lustre: Skipped 1 previous similar message [30281.887976] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549547570/real 1549547570] req@ffff8977e1c38f00 x1624787073663168/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549547581 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [30281.915228] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages [30424.928599] LustreError: 21607:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff8977e1c38f00 x1624787073663168 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff897752693180/0x223c295d724ae80f lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x668a4c37b112c4bb expref: 12 pid: 21652 timeout: 30562 lvb_type: 0 [30424.971281] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [30424.983810] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff897752693180/0x223c295d724ae80f lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x668a4c37b112c4bb expref: 13 pid: 21652 timeout: 0 lvb_type: 0 [30796.277883] Lustre: 21721:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549548085/real 1549548085] req@ffff8975c63aa400 x1624787073993776/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549548096 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [30796.305139] Lustre: 21721:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [30805.283660] Lustre: MGS: haven't heard from client d90dbbd9-489b-a070-1227-a7b4170cfe2a (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899702769000, cur 1549548105 expire 1549547955 last 1549547878 [30805.304671] Lustre: Skipped 4 previous similar messages [30840.943510] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [30840.953164] Lustre: Skipped 7 previous similar messages [31034.344228] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [31034.354409] Lustre: Skipped 2 previous similar messages [31430.307991] Lustre: MGS: haven't heard from client 2a63688c-8d9a-c981-4095-11f3e69b50a1 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89660f760c00, cur 1549548730 expire 1549548580 last 1549548503 [31430.329001] Lustre: Skipped 5 previous similar messages [31486.969462] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [31486.979119] Lustre: Skipped 7 previous similar messages [31660.413568] Lustre: 21568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549548949/real 1549548949] req@ffff897532a2a400 x1624787074638400/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549548960 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [31790.419206] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [31803.444178] LustreError: 21568:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff897532a2a400 x1624787074638400 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8976c070fbc0/0x223c295d7c02df2d lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xdc2bc071830383a0 expref: 12 pid: 21568 timeout: 31940 lvb_type: 0 [31803.486863] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [31803.499396] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8976c070fbc0/0x223c295d7c02df2d lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xdc2bc071830383a0 expref: 13 pid: 21568 timeout: 0 lvb_type: 0 [32196.316809] Lustre: fir-MDT0002: haven't heard from client fb7a63b3-a2e8-8cb2-cc04-e373185c10f0 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89758b3c9400, cur 1549549496 expire 1549549346 last 1549549269 [32196.338514] Lustre: Skipped 4 previous similar messages [32245.697760] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [32245.707414] Lustre: Skipped 7 previous similar messages [32365.512257] Lustre: 21480:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549549654/real 1549549654] req@ffff8977e1cd8300 x1624787075185456/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549549665 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [32365.539526] Lustre: 21480:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [32508.552873] LustreError: 21480:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff8977e1cd8300 x1624787075185456 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8967fedf4140/0x223c295d80a4c52d lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xf87512c534bfb28b expref: 12 pid: 21544 timeout: 32645 lvb_type: 0 [32508.595552] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [32508.608097] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8967fedf4140/0x223c295d80a4c52d lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xf87512c534bfb28b expref: 13 pid: 21544 timeout: 0 lvb_type: 0 [32546.497442] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [32546.507619] Lustre: Skipped 2 previous similar messages [32891.325659] Lustre: fir-MDT0000: haven't heard from client fb7ab70f-b5a9-5ed3-f206-37559857d5a7 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8966a7f17000, cur 1549550191 expire 1549550041 last 1549549964 [32891.347366] Lustre: Skipped 4 previous similar messages [32912.324235] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [32912.333889] Lustre: Skipped 7 previous similar messages [33302.576344] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [33536.692835] Lustre: MGS: Connection restored to 50874c10-ad62-14a6-66f7-8fc40b97ec73 (at 10.8.20.6@o2ib6) [33536.702406] Lustre: Skipped 7 previous similar messages [33539.342969] Lustre: MGS: haven't heard from client 7f063833-e7fc-b85b-e006-438f40b8dfc2 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996faf07800, cur 1549550839 expire 1549550689 last 1549550612 [33539.363982] Lustre: Skipped 5 previous similar messages [33985.294908] Lustre: 21594:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549551273/real 1549551273] req@ffff8985c0361e00 x1624787076468064/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549551284 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [33985.322157] Lustre: 21594:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [34058.650551] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [34058.660728] Lustre: Skipped 2 previous similar messages [34062.333844] Lustre: 21594:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549551350/real 1549551350] req@ffff8985c0361e00 x1624787076468064/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549551361 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [34062.361131] Lustre: 21594:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [34128.372516] LustreError: 21594:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff8985c0361e00 x1624787076468064 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8976dba460c0/0x223c295d8c90e48a lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x679d1001b2b07514 expref: 16 pid: 21632 timeout: 34265 lvb_type: 0 [34128.415216] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [34128.427752] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8976dba460c0/0x223c295d8c90e48a lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x679d1001b2b07514 expref: 17 pid: 21632 timeout: 0 lvb_type: 0 [34154.360643] Lustre: fir-MDT0002: haven't heard from client cb0fe19f-13fd-9476-a22a-9ecc4687c91e (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897608b7a400, cur 1549551454 expire 1549551304 last 1549551227 [34154.382348] Lustre: Skipped 2 previous similar messages [34173.821173] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [34173.830829] Lustre: Skipped 10 previous similar messages [34333.357638] Lustre: 21590:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549551625/real 1549551625] req@ffff8985e46af800 x1624787076730032/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549551632 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [34333.384895] Lustre: 21590:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [34480.398369] LustreError: 21590:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff8985e46af800 x1624787076730032 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8977516660c0/0x223c295d8e3050b2 lrc: 4/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x13/0x0 rrc: 880 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xf2c4806f094230ea expref: 11 pid: 21491 timeout: 34621 lvb_type: 0 [34480.440888] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [34480.453434] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8977516660c0/0x223c295d8e3050b2 lrc: 3/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x13/0x0 rrc: 880 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xf2c4806f094230ea expref: 12 pid: 21491 timeout: 0 lvb_type: 0 [34814.732716] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [34814.741173] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [34814.741175] Lustre: Skipped 5 previous similar messages [34814.758369] Lustre: Skipped 1 previous similar message [34855.383958] Lustre: MGS: haven't heard from client ad136d4a-ab69-5aa5-1391-97075b6ee568 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997066e3800, cur 1549552155 expire 1549552005 last 1549551928 [34855.404975] Lustre: Skipped 3 previous similar messages [35570.805060] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [35570.815272] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [35570.825709] Lustre: Skipped 4 previous similar messages [35887.400669] Lustre: fir-MDT0000: haven't heard from client 6ff78f59-4d7f-d121-1724-154727e4138f (at 10.9.0.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977e68d1000, cur 1549553187 expire 1549553037 last 1549552960 [35887.422291] Lustre: Skipped 2 previous similar messages [36326.905034] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [36326.910070] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [36326.910073] Lustre: Skipped 4 previous similar messages [36326.930712] Lustre: Skipped 2 previous similar messages [37082.983008] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [37082.993073] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [37083.003370] Lustre: Skipped 1 previous similar message [37184.215174] Lustre: 21400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549554476/real 1549554476] req@ffff898645acb600 x1624787078891808/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549554483 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [37184.242425] Lustre: 21400:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages [37225.629217] Lustre: 21409:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549554518/real 1549554518] req@ffff896542276900 x1624787078928400/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549554525 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [37225.656467] Lustre: 21409:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [37273.464965] Lustre: fir-MDT0000: haven't heard from client 2a2c1530-562a-e1a0-9f97-e43b6462c8be (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89859bab9c00, cur 1549554573 expire 1549554423 last 1549554346 [37273.486695] Lustre: Skipped 2 previous similar messages [37635.444496] Lustre: fir-MDT0000: haven't heard from client eb4112c5-9899-1492-0b5e-0c38772e631e (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897420fb5800, cur 1549554935 expire 1549554785 last 1549554708 [37635.466199] Lustre: Skipped 2 previous similar messages [37839.074406] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [37839.082176] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [37839.082178] Lustre: Skipped 7 previous similar messages [37839.100082] Lustre: Skipped 2 previous similar messages [37919.704633] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549555212/real 1549555212] req@ffff89740c339b00 x1624787079463328/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549555219 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [37919.731888] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [37933.741983] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549555226/real 1549555226] req@ffff89740c339b00 x1624787079463328/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549555233 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [37933.769236] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [37954.778509] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549555247/real 1549555247] req@ffff89740c339b00 x1624787079463328/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549555254 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [37954.805763] Lustre: 21607:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [37971.460774] Lustre: MGS: haven't heard from client 4ad9c40b-7820-c297-a848-86e5ace9c8a4 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996e778ac00, cur 1549555271 expire 1549555121 last 1549555044 [37971.481783] Lustre: Skipped 2 previous similar messages [38376.082080] Lustre: 21718:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549555631/real 1549555631] req@ffff896606ed2d00 x1624787079741456/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549555675 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [38376.109333] Lustre: 21718:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [38464.121290] Lustre: 21718:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549555719/real 1549555719] req@ffff896606ed2d00 x1624787079741456/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549555763 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [38464.148538] Lustre: 21718:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [38508.158422] LustreError: 21718:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff896606ed2d00 x1624787079741456 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8977e2c64800/0x223c295dac3b7cb5 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 137 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xbf762fd33f10837c expref: 12 pid: 21523 timeout: 38612 lvb_type: 0 [38508.201101] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [38508.213637] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 176s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8977e2c64800/0x223c295dac3b7cb5 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 137 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xbf762fd33f10837c expref: 13 pid: 21523 timeout: 0 lvb_type: 0 [38527.496307] Lustre: fir-MDT0002: haven't heard from client 176dc5b7-faf2-3221-700a-8b3ef8fdf39c (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89650ce17c00, cur 1549555827 expire 1549555677 last 1549555600 [38527.518013] Lustre: Skipped 2 previous similar messages [38593.960663] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [38593.970317] Lustre: Skipped 4 previous similar messages [38595.155179] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [39351.238576] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [39351.246970] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [39351.246973] Lustre: Skipped 7 previous similar messages [39351.264229] Lustre: Skipped 2 previous similar messages [39948.518100] Lustre: fir-MDT0002: haven't heard from client 284a51e6-9dcb-3d67-8c66-2d0516f18a34 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974772b7400, cur 1549557248 expire 1549557098 last 1549557021 [39948.539806] Lustre: Skipped 4 previous similar messages [40011.718789] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [40011.728448] Lustre: Skipped 1 previous similar message [40107.311587] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [40542.517527] Lustre: fir-MDT0000: haven't heard from client 2b454474-c706-d2e4-037d-d93cda7a871a (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897468a72000, cur 1549557842 expire 1549557692 last 1549557615 [40542.539229] Lustre: Skipped 2 previous similar messages [40863.396597] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [40863.406773] Lustre: Skipped 1 previous similar message [40863.411946] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [40863.422397] Lustre: Skipped 8 previous similar messages [40977.528367] Lustre: fir-MDT0000: haven't heard from client b5a178da-c21f-ebfc-0870-b0fecbb14619 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89852f79a000, cur 1549558277 expire 1549558127 last 1549558050 [40977.550071] Lustre: Skipped 2 previous similar messages [41307.536670] Lustre: fir-MDT0000: haven't heard from client 76299f36-61ce-4939-f6ac-5c4c7edcfe8d (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896544ed8c00, cur 1549558607 expire 1549558457 last 1549558380 [41307.558373] Lustre: Skipped 2 previous similar messages [41569.807940] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [41569.817601] Lustre: Skipped 6 previous similar messages [41619.491018] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [41619.501018] Lustre: Skipped 1 previous similar message [42375.570509] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [42375.579790] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [42375.579792] Lustre: Skipped 7 previous similar messages [42375.596163] Lustre: Skipped 1 previous similar message [42638.808060] Lustre: 21494:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549559931/real 1549559931] req@ffff896641618f00 x1624787084388000/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549559938 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [42638.835312] Lustre: 21494:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [42654.575951] Lustre: fir-MDT0002: haven't heard from client e4ed24ae-d121-e58a-697f-c449b1704b5b (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974d277b800, cur 1549559954 expire 1549559804 last 1549559727 [42654.597656] Lustre: Skipped 8 previous similar messages [43005.580270] Lustre: fir-MDT0000: haven't heard from client 6875d4c0-42da-0adc-0f20-0200e6843220 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897715790c00, cur 1549560305 expire 1549560155 last 1549560078 [43005.601971] Lustre: Skipped 2 previous similar messages [43013.850248] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [43013.859906] Lustre: Skipped 4 previous similar messages [43131.653281] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [43293.593056] Lustre: MGS: haven't heard from client 3334dcad-6e8d-ef09-be44-4f8d3c40720d (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89851124cc00, cur 1549560593 expire 1549560443 last 1549560366 [43293.614066] Lustre: Skipped 2 previous similar messages [43539.551665] Lustre: 21730:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549560827/real 1549560827] req@ffff8974636bbc00 x1624787085205344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549560838 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [43539.578918] Lustre: 21730:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [43550.588943] Lustre: 21730:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549560838/real 1549560838] req@ffff8974636bbc00 x1624787085205344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549560849 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [43561.616223] Lustre: 21730:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549560850/real 1549560850] req@ffff8974636bbc00 x1624787085205344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549560861 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [43572.643498] Lustre: 21730:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549560861/real 1549560861] req@ffff8974636bbc00 x1624787085205344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549560872 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [43594.671043] Lustre: 21730:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549560883/real 1549560883] req@ffff8974636bbc00 x1624787085205344/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549560894 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [43594.698293] Lustre: 21730:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [43603.598934] Lustre: fir-MDT0002: haven't heard from client ced06490-469f-6cf5-ea58-e3228e78d46f (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984fd20dc00, cur 1549560903 expire 1549560753 last 1549560676 [43603.620642] Lustre: Skipped 2 previous similar messages [43640.773996] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [43640.783655] Lustre: Skipped 7 previous similar messages [43887.736940] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [43887.746945] Lustre: Skipped 2 previous similar messages [44643.807968] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [44643.818170] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [44643.828603] Lustre: Skipped 5 previous similar messages [45399.899542] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [45399.908884] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [45399.919977] Lustre: Skipped 2 previous similar messages [46155.972773] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [46155.982978] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [46155.993412] Lustre: Skipped 1 previous similar message [46912.072775] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [46912.074321] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [46912.074323] Lustre: Skipped 1 previous similar message [46912.098349] Lustre: Skipped 2 previous similar messages [47525.151203] Lustre: MGS: Connection restored to ad1b8c61-ba8b-a6be-d55c-d4e268e643fc (at 10.8.2.19@o2ib6) [47525.160776] Lustre: Skipped 10 previous similar messages [47637.695488] Lustre: 21735:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549564930/real 1549564930] req@ffff897337287b00 x1624787094169648/t0(0) o104->fir-MDT0002@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549564937 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [47644.722673] Lustre: 21735:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549564937/real 1549564937] req@ffff897337287b00 x1624787094169648/t0(0) o104->fir-MDT0002@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549564944 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [47644.749855] Lustre: 21735:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [47654.706921] Lustre: 21746:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549564947/real 1549564947] req@ffff89972c28b300 x1624787094174336/t0(0) o104->fir-MDT0002@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549564954 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [47654.734108] Lustre: 21746:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [47668.147783] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [47673.572391] Lustre: 21509:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549564965/real 1549564965] req@ffff897361fb6300 x1624787094168176/t0(0) o104->fir-MDT0002@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549564972 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [47673.599564] Lustre: 21509:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages [47714.760431] Lustre: 21735:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549565007/real 1549565007] req@ffff897337287b00 x1624787094169648/t0(0) o104->fir-MDT0002@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549565014 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [47714.787596] Lustre: 21735:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 33 previous similar messages [47784.799210] LustreError: 21735:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.15.5@o2ib6) failed to reply to blocking AST (req@ffff897337287b00 x1624787094169648 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff89770ea06780/0x223c295de788c1e7 lrc: 4/0,0 mode: PR/PR res: [0x2c00016e3:0xafcd:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.15.5@o2ib6 remote: 0x3bceab939d1ad79b expref: 1525 pid: 21558 timeout: 47925 lvb_type: 0 [47784.841812] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.15.5@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [47784.854266] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.15.5@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff89770ea06780/0x223c295de788c1e7 lrc: 3/0,0 mode: PR/PR res: [0x2c00016e3:0xafcd:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.15.5@o2ib6 remote: 0x3bceab939d1ad79b expref: 1526 pid: 21558 timeout: 0 lvb_type: 0 [47827.711956] Lustre: MGS: haven't heard from client 4c3c30ae-5811-d8c8-e710-418d03904bc7 (at 10.8.15.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996d7ebf400, cur 1549565127 expire 1549564977 last 1549564900 [47827.732880] Lustre: Skipped 2 previous similar messages [48138.620652] Lustre: MGS: Connection restored to d1cae5c1-5574-6ef7-2d6f-b39b3d196bd5 (at 10.8.3.16@o2ib6) [48138.630221] Lustre: Skipped 10 previous similar messages [48424.233498] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [48424.243682] Lustre: Skipped 1 previous similar message [49180.322567] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [49180.322673] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [49180.322675] Lustre: Skipped 4 previous similar messages [49180.348224] Lustre: Skipped 2 previous similar messages [49936.385981] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [49936.396192] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [49936.406634] Lustre: Skipped 1 previous similar message [50692.487769] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [50692.497951] Lustre: Skipped 1 previous similar message [50692.500381] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [50692.500384] Lustre: Skipped 1 previous similar message [51448.573982] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [51448.584014] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [51448.594279] Lustre: Skipped 2 previous similar messages [52204.656178] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [52204.666353] Lustre: Skipped 1 previous similar message [52204.671525] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [52204.681995] Lustre: Skipped 1 previous similar message [52960.746703] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [52960.756702] Lustre: Skipped 1 previous similar message [52960.761878] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [52960.772138] Lustre: Skipped 1 previous similar message [53716.827286] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [53716.837465] Lustre: Skipped 1 previous similar message [53716.842637] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [53716.853076] Lustre: Skipped 1 previous similar message [53917.295066] LNet: Service thread pid 21388 was inactive for 200.41s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [53917.312003] Pid: 21388, comm: mdt02_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [53917.321784] Call Trace: [53917.324251] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [53917.331201] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [53917.338410] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [53917.345242] [] mdt_object_lock_internal+0x70/0x3e0 [mdt] [53917.352243] [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] [53917.359248] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [53917.365814] [] mdt_intent_policy+0x2e8/0xd00 [mdt] [53917.372299] [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [53917.379063] [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [53917.386191] [] tgt_enqueue+0x62/0x210 [ptlrpc] [53917.392363] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [53917.399307] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [53917.407045] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [53917.413370] [] kthread+0xd1/0xe0 [53917.418286] [] ret_from_fork_nospec_begin+0xe/0x21 [53917.424761] [] 0xffffffffffffffff [53917.429795] LustreError: dumping log to /tmp/lustre-log.1549571216.21388 [53940.853664] Lustre: fir-MDT0000: haven't heard from client f00f87f8-194f-1408-ef61-be9f9b6e5206 (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977e6f8dc00, cur 1549571240 expire 1549571090 last 1549571013 [53940.875302] Lustre: Skipped 4 previous similar messages [54016.877578] LustreError: 21388:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1549571016, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff897298bdaac0/0x223c295e14fba950 lrc: 3/1,0 mode: --/PR res: [0x2c0003357:0x1:0x0].0x26c78500 bits 0x2/0x0 rrc: 3 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21388 timeout: 0 lvb_type: 0 [54235.783100] Lustre: 20799:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549571523/real 1549571523] req@ffff89844a698900 x1624787123744896/t0(0) o104->fir-MDT0002@10.8.20.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549571534 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [54235.810347] Lustre: 20799:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 77 previous similar messages [54246.820374] Lustre: 20799:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549571534/real 1549571534] req@ffff89844a698900 x1624787123744896/t0(0) o104->fir-MDT0002@10.8.20.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549571545 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [54268.847927] Lustre: 20799:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549571556/real 1549571556] req@ffff89844a698900 x1624787123744896/t0(0) o104->fir-MDT0002@10.8.20.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549571567 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [54268.875176] Lustre: 20799:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [54312.825011] Lustre: 21682:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8984e43f1200 x1624748261526944/t0(0) o101->bcdbcaa2-b5a0-6ff6-1390-a90accf35015@10.9.106.20@o2ib4:556/0 lens 592/3264 e 24 to 0 dl 1549571616 ref 2 fl Interpret:/0/0 rc 0/0 [54312.853997] Lustre: 21682:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message [54312.886028] Lustre: 20799:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549571601/real 1549571601] req@ffff89844a698900 x1624787123744896/t0(0) o104->fir-MDT0002@10.8.20.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549571612 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [54312.913274] Lustre: 20799:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [54318.938886] Lustre: fir-MDT0002: Client bcdbcaa2-b5a0-6ff6-1390-a90accf35015 (at 10.9.106.20@o2ib4) reconnecting [54318.949061] Lustre: Skipped 1 previous similar message [54318.954240] Lustre: fir-MDT0002: Connection restored to ac7151a3-7cea-7afd-34ef-00451766b672 (at 10.9.106.20@o2ib4) [54318.964694] Lustre: Skipped 3 previous similar messages [54355.873265] Lustre: MGS: haven't heard from client 97b6c46a-cad0-c09e-4410-7b3e833c17ae (at 10.8.20.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8967f536fc00, cur 1549571655 expire 1549571505 last 1549571428 [54355.894271] Lustre: Skipped 2 previous similar messages [54472.924337] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [54512.868138] Lustre: fir-MDT0000: haven't heard from client a588f8ab-2914-47e1-9056-537179d8cc6f (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977e60b0800, cur 1549571812 expire 1549571662 last 1549571585 [54512.889766] Lustre: Skipped 5 previous similar messages [54678.741185] Lustre: 21573:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549571970/real 1549571970] req@ffff897221e1da00 x1624787126429152/t0(0) o104->fir-MDT0002@10.8.14.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549571977 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [54678.768348] Lustre: 21573:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [54736.873996] Lustre: fir-MDT0002: haven't heard from client a4d87b82-ef32-e920-6e6c-e41a82a7f24a (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8987bf388400, cur 1549572036 expire 1549571886 last 1549571809 [54736.895697] Lustre: Skipped 2 previous similar messages [54779.318626] LustreError: 21449:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.14.7@o2ib6) returned error from blocking AST (req@ffff8984012c1500 x1624787126524288 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff89743c289f80/0x223c295e17486ce7 lrc: 4/0,0 mode: PR/PR res: [0x2c000175b:0x30c:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.14.7@o2ib6 remote: 0x7c3a6c719b592b25 expref: 2926 pid: 21587 timeout: 54927 lvb_type: 0 [54779.361489] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.14.7@o2ib6 was evicted due to a lock blocking callback time out: rc -107 [54779.373941] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 63s: evicting client at 10.8.14.7@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff89743c289f80/0x223c295e17486ce7 lrc: 3/0,0 mode: PR/PR res: [0x2c000175b:0x30c:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.14.7@o2ib6 remote: 0x7c3a6c719b592b25 expref: 2927 pid: 21587 timeout: 0 lvb_type: 0 [54779.417953] LustreError: 21632:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8977e2329b00 x1624787126765232/t0(0) o104->fir-MDT0002@10.8.14.7@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [54822.875937] Lustre: fir-MDT0000: haven't heard from client 706c1b78-146b-d70f-b0af-45209f88d1a7 (at 10.8.14.7@o2ib6) in 178 seconds. I think it's dead, and I am evicting it. exp ffff8977e6103400, cur 1549572122 expire 1549571972 last 1549571944 [54822.897553] Lustre: Skipped 2 previous similar messages [55086.917545] Lustre: MGS: haven't heard from client 3f31635c-f6b0-5a6f-f3b0-ab88717609c2 (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89978b7fbc00, cur 1549572386 expire 1549572236 last 1549572159 [55086.938556] Lustre: Skipped 1 previous similar message [55182.888573] Lustre: MGS: Connection restored to dff2fe10-03c6-c93a-38ba-c6e18ef92d18 (at 10.8.10.33@o2ib6) [55182.898225] Lustre: Skipped 10 previous similar messages [55229.008605] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [55783.230818] Lustre: MGS: Connection restored to d5201504-af42-83fa-45a5-b907c09e88b6 (at 10.9.112.13@o2ib4) [55783.240556] Lustre: Skipped 15 previous similar messages [55788.900059] Lustre: fir-MDT0000: haven't heard from client 1278cdf0-a27f-222f-a45b-a06184e19558 (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8963e9e37c00, cur 1549573088 expire 1549572938 last 1549572861 [55788.921761] Lustre: Skipped 2 previous similar messages [55985.091720] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [56464.855800] perf: interrupt took too long (3127 > 3126), lowering kernel.perf_event_max_sample_rate to 63000 [56741.175762] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [56741.185785] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [56741.196050] Lustre: Skipped 30 previous similar messages [57270.937448] Lustre: fir-MDT0000: haven't heard from client 85e8a876-efff-2831-db66-61f0415adaa0 (at 10.8.21.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8987d7f03800, cur 1549574570 expire 1549574420 last 1549574343 [57270.959149] Lustre: Skipped 5 previous similar messages [57497.274950] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [57497.284987] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [57497.295251] Lustre: Skipped 3 previous similar messages [57839.951570] Lustre: fir-MDT0000: haven't heard from client d351c60b-e83d-2faf-4977-70301d72aa1d (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996f067d800, cur 1549575139 expire 1549574989 last 1549574912 [57839.973280] Lustre: Skipped 2 previous similar messages [58074.219391] Lustre: 20796:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549575366/real 1549575366] req@ffff8972227d3f00 x1624787137239632/t0(0) o104->fir-MDT0002@10.8.10.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549575373 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [58074.246666] Lustre: 20796:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 237 previous similar messages [58095.256915] Lustre: 20796:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549575387/real 1549575387] req@ffff8972227d3f00 x1624787137239632/t0(0) o104->fir-MDT0002@10.8.10.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549575394 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [58095.284164] Lustre: 20796:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [58097.958074] Lustre: fir-MDT0000: haven't heard from client 2af257a5-1065-5a4f-8bd3-5ac650d59773 (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89729cb92400, cur 1549575397 expire 1549575247 last 1549575170 [58097.979775] Lustre: Skipped 2 previous similar messages [58253.374198] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [58253.384240] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [58253.394503] Lustre: Skipped 3 previous similar messages [58382.359136] Lustre: 21791:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549575670/real 1549575670] req@ffff8983d7727200 x1624787139365536/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549575681 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [58456.967099] Lustre: fir-MDT0002: haven't heard from client acdcfa73-de08-8413-3214-93443dc4370a (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8963ac66c000, cur 1549575756 expire 1549575606 last 1549575529 [58456.988803] Lustre: Skipped 2 previous similar messages [58638.453557] Lustre: 21675:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549575926/real 1549575926] req@ffff89839dff7500 x1624787142846672/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549575937 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [58638.480819] Lustre: 21675:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [58781.497178] LustreError: 21675:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff89839dff7500 x1624787142846672 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff898451780900/0x223c295e452abe08 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 131 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xeda142b7ccef5734 expref: 11 pid: 21315 timeout: 58918 lvb_type: 0 [58781.539864] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [58781.552400] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff898451780900/0x223c295e452abe08 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 131 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xeda142b7ccef5734 expref: 12 pid: 21315 timeout: 0 lvb_type: 0 [59009.473343] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [59009.483380] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [59009.493638] Lustre: Skipped 6 previous similar messages [59185.930301] Lustre: 21423:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549576433/real 1549576433] req@ffff8971ceb3ad00 x1624787146379392/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549576484 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [59185.957559] Lustre: 21423:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [59274.987599] Lustre: fir-MDT0000: haven't heard from client dac84c08-eb96-2937-4097-438cc10b8eef (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8983c8abc400, cur 1549576574 expire 1549576424 last 1549576347 [59275.009322] Lustre: Skipped 4 previous similar messages [59765.572571] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [59765.582602] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [59765.592861] Lustre: Skipped 3 previous similar messages [60238.012146] Lustre: fir-MDT0000: haven't heard from client 82c3fbf7-ed12-0c92-5868-f9e808a05030 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89638e30c400, cur 1549577537 expire 1549577387 last 1549577310 [60238.033849] Lustre: Skipped 2 previous similar messages [60521.672004] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [60521.682034] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [60521.692312] Lustre: Skipped 3 previous similar messages [61068.032521] Lustre: fir-MDT0000: haven't heard from client f3abbf68-cd46-7980-c997-d1f225abefb9 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897137fa7800, cur 1549578367 expire 1549578217 last 1549578140 [61068.054227] Lustre: Skipped 5 previous similar messages [61277.771459] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [61277.781486] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [61277.791752] Lustre: Skipped 6 previous similar messages [62033.870668] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [62033.880698] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [62033.890963] Lustre: Skipped 3 previous similar messages [62045.057055] Lustre: fir-MDT0000: haven't heard from client 41a266fa-5fd6-ebca-126e-993c95a24752 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89723e7dc400, cur 1549579344 expire 1549579194 last 1549579117 [62045.078894] Lustre: Skipped 5 previous similar messages [62397.837155] Lustre: DEBUG MARKER: Thu Feb 7 14:48:16 2019 [62789.969744] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [62789.979778] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [62789.990042] Lustre: Skipped 6 previous similar messages [62801.076127] Lustre: fir-MDT0000: haven't heard from client bf793bf2-a4ce-104d-e40a-802d41887ca7 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8972e1fe1c00, cur 1549580100 expire 1549579950 last 1549579873 [62801.097827] Lustre: Skipped 5 previous similar messages [63546.068815] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [63546.078844] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [63546.089257] Lustre: Skipped 6 previous similar messages [63598.096023] Lustre: fir-MDT0000: haven't heard from client b8240219-2ae5-f36c-8100-b7dacd3c07fc (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8972bbfbf800, cur 1549580897 expire 1549580747 last 1549580670 [63598.117728] Lustre: Skipped 5 previous similar messages [64249.118200] Lustre: MGS: haven't heard from client 8bf97fcc-f717-2285-6969-96e7f78b9548 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896409212c00, cur 1549581548 expire 1549581398 last 1549581321 [64249.139212] Lustre: Skipped 6 previous similar messages [64269.979261] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [64269.988919] Lustre: Skipped 7 previous similar messages [64302.168018] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [64891.466155] Lustre: MGS: Connection restored to f11f601d-401d-0aef-2f13-c25318014c3d (at 10.8.17.17@o2ib6) [64891.475812] Lustre: Skipped 6 previous similar messages [65024.132889] Lustre: fir-MDT0000: haven't heard from client 274266dd-d113-7b9f-12c9-ef15c802ac1b (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89844bf0a000, cur 1549582323 expire 1549582173 last 1549582096 [65024.154591] Lustre: Skipped 5 previous similar messages [65058.251252] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [65551.214268] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [65551.223924] Lustre: Skipped 6 previous similar messages [65814.335438] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [65906.155011] Lustre: fir-MDT0000: haven't heard from client 1b587538-5f92-3367-5f39-075cab7d3820 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89713fb97000, cur 1549583205 expire 1549583055 last 1549582978 [65906.176715] Lustre: Skipped 5 previous similar messages [66322.560101] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [66322.569756] Lustre: Skipped 6 previous similar messages [66570.418734] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [66627.198935] Lustre: MGS: haven't heard from client 8e8e686b-e84d-092a-b226-45cae80ea9ed (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8963f1382800, cur 1549583926 expire 1549583776 last 1549583699 [66627.219949] Lustre: Skipped 5 previous similar messages [66978.579904] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [66978.589564] Lustre: Skipped 6 previous similar messages [67283.188575] Lustre: fir-MDT0000: haven't heard from client e79a52ad-b85e-333c-14be-d58f88ee9e0f (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8972f0b01000, cur 1549584582 expire 1549584432 last 1549584355 [67283.210278] Lustre: Skipped 5 previous similar messages [67326.502084] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [67631.763637] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [67631.773294] Lustre: Skipped 6 previous similar messages [67962.205768] Lustre: fir-MDT0000: haven't heard from client c7534741-3a86-1e38-bf4d-1cdbd1279905 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898470391c00, cur 1549585261 expire 1549585111 last 1549585034 [67962.227493] Lustre: Skipped 5 previous similar messages [68082.586380] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [68336.423523] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [68336.433178] Lustre: Skipped 6 previous similar messages [68641.222627] Lustre: fir-MDT0000: haven't heard from client 2688633e-1055-6cfb-08d5-2a680df0b15d (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8985b5255c00, cur 1549585940 expire 1549585790 last 1549585713 [68641.244328] Lustre: Skipped 5 previous similar messages [68838.669722] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [69021.742226] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [69021.751883] Lustre: Skipped 6 previous similar messages [69552.245521] Lustre: fir-MDT0000: haven't heard from client 00e1eece-ffaa-04d3-0e9c-f1380ab84c6b (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89975fa6b000, cur 1549586851 expire 1549586701 last 1549586624 [69552.267243] Lustre: Skipped 5 previous similar messages [69594.754132] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [69942.763717] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [69942.773379] Lustre: Skipped 6 previous similar messages [70247.262904] Lustre: fir-MDT0000: haven't heard from client b4a3bb23-bcfc-6104-c975-120c89ea8e9d (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8964eef0d400, cur 1549587546 expire 1549587396 last 1549587319 [70247.284631] Lustre: Skipped 5 previous similar messages [70350.837579] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [70596.029395] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [70596.039060] Lustre: Skipped 6 previous similar messages [70876.279519] Lustre: fir-MDT0000: haven't heard from client fc9415e7-d01b-84f8-2e5b-b4d0717f5c1f (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89845739ec00, cur 1549588175 expire 1549588025 last 1549587948 [70876.301236] Lustre: Skipped 5 previous similar messages [71106.922240] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [71253.102841] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [71253.112498] Lustre: Skipped 6 previous similar messages [71533.295235] Lustre: fir-MDT0000: haven't heard from client 46be6b00-66d7-c50e-224b-69548a08fbd1 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89735239cc00, cur 1549588832 expire 1549588682 last 1549588605 [71533.316938] Lustre: Skipped 5 previous similar messages [71863.005903] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [71863.015923] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [71863.026179] Lustre: Skipped 5 previous similar messages [71943.219416] LNet: Service thread pid 21544 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [71943.236356] Pid: 21544, comm: mdt00_046 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [71943.246090] Call Trace: [71943.248544] [] 0xffffffffffffffff [71943.253570] LustreError: dumping log to /tmp/lustre-log.1549589241.21544 [72146.310840] Lustre: fir-MDT0000: haven't heard from client b342f87d-1648-83c7-2f60-d1b4cce4bbae (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896556796400, cur 1549589445 expire 1549589295 last 1549589218 [72146.332550] Lustre: Skipped 5 previous similar messages [72338.135330] Lustre: 21553:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8963eef38f00 x1624712163628944/t0(0) o101->a9e2a338-60d5-67b8-b23e-aaade58467c0@10.8.7.14@o2ib6:461/0 lens 480/568 e 24 to 0 dl 1549589641 ref 2 fl Interpret:/0/0 rc 0/0 [72344.934253] Lustre: fir-MDT0002: Client a9e2a338-60d5-67b8-b23e-aaade58467c0 (at 10.8.7.14@o2ib6) reconnecting [72491.090516] LNet: Service thread pid 21544 completed after 748.20s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [72536.917014] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [72536.926675] Lustre: Skipped 10 previous similar messages [72619.105097] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [72841.328017] Lustre: fir-MDT0000: haven't heard from client 7007e087-709b-244c-a47c-e3fdb192c6a6 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977183d2800, cur 1549590140 expire 1549589990 last 1549589913 [72841.349723] Lustre: Skipped 8 previous similar messages [73271.935846] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [73271.945499] Lustre: Skipped 6 previous similar messages [73375.189833] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [73923.355148] Lustre: fir-MDT0000: haven't heard from client cb991974-722a-7f80-6ae9-8e507b52074f (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898515f32800, cur 1549591222 expire 1549591072 last 1549590995 [73923.376854] Lustre: Skipped 8 previous similar messages [74084.889718] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [74084.899373] Lustre: Skipped 6 previous similar messages [74131.273745] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [74754.376481] Lustre: fir-MDT0000: haven't heard from client b3b19d74-adbd-2a14-e60b-3321bdace621 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897575a8d400, cur 1549592053 expire 1549591903 last 1549591826 [74754.398185] Lustre: Skipped 11 previous similar messages [74791.725849] Lustre: MGS: Connection restored to a2a9b575-96f2-5a7b-2fde-7aaac667d675 (at 10.8.26.33@o2ib6) [74791.735511] Lustre: Skipped 9 previous similar messages [74887.358781] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [75438.393285] Lustre: fir-MDT0000: haven't heard from client 1bee1558-0718-812b-196d-3bba3089fa38 (at 10.8.9.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977e6046000, cur 1549592737 expire 1549592587 last 1549592510 [75438.414813] Lustre: Skipped 5 previous similar messages [75613.081166] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [75613.090834] Lustre: Skipped 12 previous similar messages [75643.443855] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [76279.414303] Lustre: fir-MDT0000: haven't heard from client 765eb8bd-9513-5da1-55bc-4d7d5527cb1f (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897176bd9000, cur 1549593578 expire 1549593428 last 1549593351 [76279.436002] Lustre: Skipped 8 previous similar messages [76329.086083] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [76329.095741] Lustre: Skipped 6 previous similar messages [76399.527739] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [76961.431453] Lustre: fir-MDT0002: haven't heard from client bf251e25-d9e2-9a50-3359-a9d643a75679 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896f612f6c00, cur 1549594260 expire 1549594110 last 1549594033 [76961.453161] Lustre: Skipped 8 previous similar messages [77025.344319] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [77025.353977] Lustre: Skipped 9 previous similar messages [77155.612099] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [77199.345318] Lustre: 28129:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549594490/real 1549594490] req@ffff899674aa4e00 x1624787222474464/t0(0) o104->fir-MDT0000@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549594497 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [77199.372571] Lustre: 28129:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [77241.383370] Lustre: 28129:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549594532/real 1549594532] req@ffff899674aa4e00 x1624787222474464/t0(0) o104->fir-MDT0000@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549594539 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [77241.410620] Lustre: 28129:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [77318.424302] Lustre: 28129:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549594609/real 1549594609] req@ffff899674aa4e00 x1624787222474464/t0(0) o104->fir-MDT0000@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549594616 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [77318.451551] Lustre: 28129:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages [77346.462025] LustreError: 28129:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.33@o2ib6) failed to reply to blocking AST (req@ffff899674aa4e00 x1624787222474464 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff89658a778240/0x223c295ed07d50e3 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 115 type: IBT flags: 0x60200400000020 nid: 10.8.26.33@o2ib6 remote: 0x6569ed8202e839b5 expref: 17 pid: 21503 timeout: 77486 lvb_type: 0 [77346.504709] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.26.33@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [77346.517249] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.26.33@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff89658a778240/0x223c295ed07d50e3 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 115 type: IBT flags: 0x60200400000020 nid: 10.8.26.33@o2ib6 remote: 0x6569ed8202e839b5 expref: 18 pid: 21503 timeout: 0 lvb_type: 0 [77659.448894] Lustre: fir-MDT0000: haven't heard from client ef862c40-6aa9-85b2-cbe8-94d2b684d798 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982ba720000, cur 1549594958 expire 1549594808 last 1549594731 [77659.470595] Lustre: Skipped 7 previous similar messages [77703.086521] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [77703.096175] Lustre: Skipped 9 previous similar messages [77911.696337] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [78092.136729] Lustre: 21547:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549595383/real 1549595383] req@ffff896fdf7bfb00 x1624787224136512/t0(0) o104->fir-MDT0000@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549595390 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [78092.164002] Lustre: 21547:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [78113.174253] Lustre: 21547:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549595404/real 1549595404] req@ffff896fdf7bfb00 x1624787224136512/t0(0) o104->fir-MDT0000@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549595411 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [78113.201509] Lustre: 21547:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [78155.213310] Lustre: 21547:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549595446/real 1549595446] req@ffff896fdf7bfb00 x1624787224136512/t0(0) o104->fir-MDT0000@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549595453 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [78155.240565] Lustre: 21547:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [78232.255246] Lustre: 21547:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549595523/real 1549595523] req@ffff896fdf7bfb00 x1624787224136512/t0(0) o104->fir-MDT0000@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549595530 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [78232.282498] Lustre: 21547:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages [78239.292437] LustreError: 21547:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.33@o2ib6) failed to reply to blocking AST (req@ffff896fdf7bfb00 x1624787224136512 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8985c7b60000/0x223c295ed804bdc1 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 121 type: IBT flags: 0x60200400000020 nid: 10.8.26.33@o2ib6 remote: 0x9c9003c153eb65a4 expref: 16 pid: 21640 timeout: 78379 lvb_type: 0 [78239.335139] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.26.33@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [78239.347668] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.26.33@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8985c7b60000/0x223c295ed804bdc1 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 121 type: IBT flags: 0x60200400000020 nid: 10.8.26.33@o2ib6 remote: 0x9c9003c153eb65a4 expref: 17 pid: 21640 timeout: 0 lvb_type: 0 [78294.467991] Lustre: fir-MDT0002: haven't heard from client 3b2df085-efb2-1d8e-1c48-5023679dd5aa (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89768c6b5800, cur 1549595593 expire 1549595443 last 1549595366 [78294.489699] Lustre: Skipped 5 previous similar messages [78342.533205] Lustre: MGS: Connection restored to a2a9b575-96f2-5a7b-2fde-7aaac667d675 (at 10.8.26.33@o2ib6) [78342.542867] Lustre: Skipped 6 previous similar messages [78667.781435] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [78849.169727] Lustre: 21525:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549596136/real 1549596136] req@ffff899668245d00 x1624787225574064/t0(0) o104->fir-MDT0000@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549596147 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [78849.196982] Lustre: 21525:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [78949.481531] Lustre: fir-MDT0000: haven't heard from client 67d2c584-ec18-497e-be04-6bdc15b58484 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995ddb1a000, cur 1549596248 expire 1549596098 last 1549596021 [78949.503251] Lustre: Skipped 7 previous similar messages [78971.520812] Lustre: MGS: Connection restored to a2a9b575-96f2-5a7b-2fde-7aaac667d675 (at 10.8.26.33@o2ib6) [78971.530469] Lustre: Skipped 12 previous similar messages [79150.779298] Lustre: 21066:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549596438/real 1549596438] req@ffff8974297ba100 x1624787225996192/t0(0) o104->fir-MDT0000@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549596449 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [79150.806547] Lustre: 21066:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [79216.817974] LustreError: 21066:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.33@o2ib6) failed to reply to blocking AST (req@ffff8974297ba100 x1624787225996192 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8975212a3f00/0x223c295edf4fa721 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 127 type: IBT flags: 0x60200400000020 nid: 10.8.26.33@o2ib6 remote: 0x57d345146a31c5cf expref: 11 pid: 21713 timeout: 79352 lvb_type: 0 [79216.860677] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.26.33@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [79216.873209] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.26.33@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8975212a3f00/0x223c295edf4fa721 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 127 type: IBT flags: 0x60200400000020 nid: 10.8.26.33@o2ib6 remote: 0x57d345146a31c5cf expref: 12 pid: 21713 timeout: 0 lvb_type: 0 [79423.865526] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [79464.250198] LustreError: 21643:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff898280a28300 x1624787226371856 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8972a9ee3600/0x223c295ee0dbc2b7 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 128 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xee9a28baf46a8a13 expref: 16 pid: 21063 timeout: 79604 lvb_type: 0 [79464.292912] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [79464.305458] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8972a9ee3600/0x223c295ee0dbc2b7 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 128 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xee9a28baf46a8a13 expref: 17 pid: 21063 timeout: 0 lvb_type: 0 [79757.098515] Lustre: 21594:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549597048/real 1549597048] req@ffff898210a02a00 x1624787227027072/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549597055 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [79757.125768] Lustre: 21594:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 31 previous similar messages [79883.142690] LustreError: 21594:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff898210a02a00 x1624787227027072 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff897746710900/0x223c295ee369b5ad lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 128 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xc238c7fdebdc5c50 expref: 16 pid: 21061 timeout: 80023 lvb_type: 0 [79883.185373] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [79883.197941] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff897746710900/0x223c295ee369b5ad lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 128 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xc238c7fdebdc5c50 expref: 17 pid: 21061 timeout: 0 lvb_type: 0 [79899.505269] Lustre: fir-MDT0002: haven't heard from client 42ffaf68-66b2-917d-984d-0b92080b8284 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896f0769bc00, cur 1549597198 expire 1549597048 last 1549596971 [79899.526979] Lustre: Skipped 9 previous similar messages [79901.428564] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [79901.438229] Lustre: Skipped 9 previous similar messages [80179.949641] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [80697.324615] Lustre: MGS: Connection restored to a2a9b575-96f2-5a7b-2fde-7aaac667d675 (at 10.8.26.33@o2ib6) [80697.334272] Lustre: Skipped 9 previous similar messages [80717.525803] Lustre: fir-MDT0002: haven't heard from client f7618d6f-7598-c57e-67ca-099073e287cd (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8971defa1000, cur 1549598016 expire 1549597866 last 1549597789 [80717.547504] Lustre: Skipped 10 previous similar messages [80936.032651] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [81468.544682] Lustre: fir-MDT0000: haven't heard from client aaf5a961-7c36-363c-f349-1f418b6e70eb (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89862bedf400, cur 1549598767 expire 1549598617 last 1549598540 [81468.566387] Lustre: Skipped 8 previous similar messages [81493.783964] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [81493.793626] Lustre: Skipped 9 previous similar messages [81692.115601] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [82206.563193] Lustre: fir-MDT0000: haven't heard from client 4106ca9f-e9e3-6773-a46d-866639d66ac2 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89832cb99800, cur 1549599505 expire 1549599355 last 1549599278 [82206.584897] Lustre: Skipped 8 previous similar messages [82220.300980] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [82220.310635] Lustre: Skipped 6 previous similar messages [82448.198590] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [82948.506192] LustreError: 21385:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) returned error from glimpse AST (req@ffff89829ebf6600 x1624787239597568 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8984e5ac72c0/0x223c295efc677b22 lrc: 4/0,0 mode: PW/PW res: [0x200003f90:0x1:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40200000000000 nid: 10.8.24.24@o2ib6 remote: 0x2b7cc926e50cd2e1 expref: 16 pid: 21379 timeout: 0 lvb_type: 0 [82948.548445] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 [82948.560907] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 251s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8984e5ac72c0/0x223c295efc677b22 lrc: 3/0,0 mode: PW/PW res: [0x200003f90:0x1:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x40200000000000 nid: 10.8.24.24@o2ib6 remote: 0x2b7cc926e50cd2e1 expref: 17 pid: 21379 timeout: 0 lvb_type: 0 [82958.584780] Lustre: MGS: haven't heard from client 06d89293-a31b-55a9-4acd-229ce5c0e04f (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896208bf3c00, cur 1549600257 expire 1549600107 last 1549600030 [82958.605785] Lustre: Skipped 5 previous similar messages [82966.916397] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [82966.926057] Lustre: Skipped 6 previous similar messages [83204.281572] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [83960.364568] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [83960.374597] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [83960.384879] Lustre: Skipped 3 previous similar messages [84423.618962] Lustre: fir-MDT0000: haven't heard from client 34326357-da5d-cd13-c882-81992d7b6ec3 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8981a9e0a400, cur 1549601722 expire 1549601572 last 1549601495 [84423.640669] Lustre: Skipped 1 previous similar message [84672.865868] Lustre: 21458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549601960/real 1549601960] req@ffff896239aba700 x1624787246506112/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549601971 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [84672.893124] Lustre: 21458:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 34 previous similar messages [84716.463586] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [84716.473628] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [84716.483890] Lustre: Skipped 3 previous similar messages [84749.905800] Lustre: 21458:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549602037/real 1549602037] req@ffff896239aba700 x1624787246506112/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549602048 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [84749.933074] Lustre: 21458:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [84773.627400] Lustre: fir-MDT0000: haven't heard from client 77681660-aec1-84f0-1a08-90e587b494ae (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961e3a9f800, cur 1549602072 expire 1549601922 last 1549601845 [84773.649256] Lustre: Skipped 2 previous similar messages [85134.637544] Lustre: fir-MDT0000: haven't heard from client 96635ce8-2ebc-524b-8beb-c1453d53539c (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897185373800, cur 1549602433 expire 1549602283 last 1549602206 [85134.659255] Lustre: Skipped 2 previous similar messages [85370.534378] Lustre: 21497:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549602657/real 1549602657] req@ffff898169e66600 x1624787251440368/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549602668 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [85370.561630] Lustre: 21497:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [85447.573714] LustreError: 21497:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) returned error from blocking AST (req@ffff898169e66600 x1624787251440368 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff896661b1af40/0x223c295f0d3e7c58 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 122 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x61766ba498091a39 expref: 11 pid: 21744 timeout: 85594 lvb_type: 0 [85447.616787] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -107 [85447.629323] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 88s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff896661b1af40/0x223c295f0d3e7c58 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 122 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x61766ba498091a39 expref: 12 pid: 21744 timeout: 0 lvb_type: 0 [85472.562576] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [85472.572609] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [85472.582873] Lustre: Skipped 6 previous similar messages [85485.648587] Lustre: fir-MDT0002: haven't heard from client f334b24e-a646-8fdc-68db-c30dc603b6ba (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896e6a679400, cur 1549602784 expire 1549602634 last 1549602557 [85485.670286] Lustre: Skipped 2 previous similar messages [86107.663879] Lustre: 21267:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549603395/real 1549603395] req@ffff897082b27200 x1624787254595376/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549603406 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [86107.691156] Lustre: 21267:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages [86228.661659] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [86228.671685] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [86228.681949] Lustre: Skipped 9 previous similar messages [86250.704486] LustreError: 21267:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff897082b27200 x1624787254595376 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff89645f37dc40/0x223c295f124582f1 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 131 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xb961217ce1bcc956 expref: 11 pid: 21554 timeout: 86386 lvb_type: 0 [86250.747174] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [86250.759705] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff89645f37dc40/0x223c295f124582f1 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 131 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xb961217ce1bcc956 expref: 12 pid: 21554 timeout: 0 lvb_type: 0 [86294.666731] Lustre: fir-MDT0002: haven't heard from client 42e22f45-bd1d-4bb9-0fd3-181f9a050833 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89814b32a800, cur 1549603593 expire 1549603443 last 1549603366 [86294.688438] Lustre: Skipped 4 previous similar messages [86937.582699] Lustre: 21680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549604224/real 1549604224] req@ffff896e6534c500 x1624787257292624/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549604235 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [86937.609955] Lustre: 21680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [86984.760796] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [86984.770822] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [86984.781084] Lustre: Skipped 6 previous similar messages [87003.623558] LustreError: 21680:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) returned error from blocking AST (req@ffff896e6534c500 x1624787257292624 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8963bc79e540/0x223c295f16b16e8f lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 92 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x2045593dcc272d02 expref: 11 pid: 21267 timeout: 87150 lvb_type: 0 [87003.666505] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -107 [87003.679042] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 77s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8963bc79e540/0x223c295f16b16e8f lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 92 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x2045593dcc272d02 expref: 12 pid: 21267 timeout: 0 lvb_type: 0 [87072.688676] Lustre: fir-MDT0002: haven't heard from client a12f155c-d959-1425-3324-a178a0b274e9 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896344268c00, cur 1549604371 expire 1549604221 last 1549604144 [87072.710381] Lustre: Skipped 4 previous similar messages [87740.860105] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [87740.870131] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [87740.880396] Lustre: Skipped 6 previous similar messages [87796.703414] Lustre: fir-MDT0000: haven't heard from client 7c6ff56a-b280-3774-bebf-d3f3f186a5af (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898137306000, cur 1549605095 expire 1549604945 last 1549604868 [87796.725117] Lustre: Skipped 4 previous similar messages [88496.959417] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [88496.969449] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [88496.979713] Lustre: Skipped 6 previous similar messages [88606.723589] Lustre: fir-MDT0000: haven't heard from client aa087276-f80b-9a84-ce53-72a824e5a422 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89713e369c00, cur 1549605905 expire 1549605755 last 1549605678 [88606.745297] Lustre: Skipped 5 previous similar messages [88836.999369] Lustre: 21395:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549606124/real 1549606124] req@ffff89814bb7a400 x1624787260459184/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549606135 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [88837.026625] Lustre: 21395:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [88914.038299] Lustre: 21395:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549606201/real 1549606201] req@ffff89814bb7a400 x1624787260459184/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549606212 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [88914.065557] Lustre: 21395:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [88980.076991] LustreError: 21395:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff89814bb7a400 x1624787260459184 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff897740add340/0x223c295f20ad33e6 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x6f06345950bff2de expref: 11 pid: 21267 timeout: 89116 lvb_type: 0 [88980.119675] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [88980.132208] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff897740add340/0x223c295f20ad33e6 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0x6f06345950bff2de expref: 12 pid: 21267 timeout: 0 lvb_type: 0 [89253.058720] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [89253.068748] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [89253.079014] Lustre: Skipped 6 previous similar messages [89262.611051] Lustre: 21517:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549606549/real 1549606549] req@ffff896e5ff34e00 x1624787260951088/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549606560 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [89262.638329] Lustre: 21517:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [89405.652660] LustreError: 21517:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff896e5ff34e00 x1624787260951088 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8962d2e10900/0x223c295f22a1d25c lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xc9e52e5ac976d0d7 expref: 11 pid: 21644 timeout: 89541 lvb_type: 0 [89405.695345] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [89405.707881] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8962d2e10900/0x223c295f22a1d25c lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xc9e52e5ac976d0d7 expref: 12 pid: 21644 timeout: 0 lvb_type: 0 [89437.745712] Lustre: fir-MDT0002: haven't heard from client a4e26a71-e5b0-4f45-9c77-640b55726707 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8962ac7a2400, cur 1549606736 expire 1549606586 last 1549606509 [89437.767416] Lustre: Skipped 4 previous similar messages [89675.401408] Lustre: 21389:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549606962/real 1549606962] req@ffff8962fd30ec00 x1624787261442624/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549606973 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [89675.428658] Lustre: 21389:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [89818.443018] LustreError: 21389:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) failed to reply to blocking AST (req@ffff8962fd30ec00 x1624787261442624 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff898731a945c0/0x223c295f247a3719 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 133 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xf207a7ca23f0c723 expref: 11 pid: 21398 timeout: 89954 lvb_type: 0 [89818.485703] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [89818.498241] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff898731a945c0/0x223c295f247a3719 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 133 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xf207a7ca23f0c723 expref: 12 pid: 21398 timeout: 0 lvb_type: 0 [89882.067280] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [89882.076937] Lustre: Skipped 3 previous similar messages [90009.158061] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [90713.776589] Lustre: fir-MDT0000: haven't heard from client a02c9a33-1832-681b-7250-dba159934e58 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89636af6e000, cur 1549608012 expire 1549607862 last 1549607785 [90713.798310] Lustre: Skipped 3 previous similar messages [90765.241440] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [90765.251470] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [90765.261730] Lustre: Skipped 3 previous similar messages [91048.645869] Lustre: 21265:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549608335/real 1549608335] req@ffff896e39f76600 x1624787263007568/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549608346 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [91048.673134] Lustre: 21265:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [91100.786285] Lustre: fir-MDT0000: haven't heard from client 33a3b0d6-437d-fd03-a2f1-cb6dcc51527a (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961a53f2400, cur 1549608399 expire 1549608249 last 1549608172 [91100.807991] Lustre: Skipped 2 previous similar messages [91319.894674] Lustre: 21554:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549608607/real 1549608607] req@ffff8961a5e6f500 x1624787263260624/t0(0) o104->fir-MDT0000@10.8.24.24@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549608618 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [91319.921934] Lustre: 21554:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [91380.344197] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [91380.353856] Lustre: Skipped 6 previous similar messages [91385.933567] LustreError: 21554:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.24.24@o2ib6) returned error from blocking AST (req@ffff8961a5e6f500 x1624787263260624 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8980fdbafbc0/0x223c295f2c0e9241 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 99 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xdda217faf22f0da7 expref: 11 pid: 21568 timeout: 91532 lvb_type: 0 [91385.976517] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.24.24@o2ib6 was evicted due to a lock blocking callback time out: rc -107 [91385.989058] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 77s: evicting client at 10.8.24.24@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8980fdbafbc0/0x223c295f2c0e9241 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 100 type: IBT flags: 0x60200400000020 nid: 10.8.24.24@o2ib6 remote: 0xdda217faf22f0da7 expref: 12 pid: 21568 timeout: 0 lvb_type: 0 [91422.794343] Lustre: fir-MDT0002: haven't heard from client 40f1a391-784c-2133-bdd6-a11542373419 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8962d32aa800, cur 1549608721 expire 1549608571 last 1549608494 [91422.816046] Lustre: Skipped 2 previous similar messages [91521.340843] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [91833.805508] Lustre: fir-MDT0000: haven't heard from client 8e861b97-eb9a-87a3-2285-4c6cce5e0ddb (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896e9ce3a400, cur 1549609132 expire 1549608982 last 1549608905 [91833.827217] Lustre: Skipped 4 previous similar messages [92214.601464] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [92214.611121] Lustre: Skipped 9 previous similar messages [92277.424407] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [92600.824008] Lustre: fir-MDT0000: haven't heard from client eb2186f6-3860-a408-fa50-8530344fa58e (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8963596dc800, cur 1549609899 expire 1549609749 last 1549609672 [92600.846032] Lustre: Skipped 8 previous similar messages [92889.339218] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [92889.348868] Lustre: Skipped 9 previous similar messages [93033.507870] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [93071.527619] Lustre: 21606:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549610362/real 1549610362] req@ffff896e74fecb00 x1624787265821632/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549610369 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [93071.554888] Lustre: 21606:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [93092.565154] Lustre: 21606:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549610383/real 1549610383] req@ffff896e74fecb00 x1624787265821632/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549610390 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [93092.592407] Lustre: 21606:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [93134.603210] Lustre: 21606:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549610425/real 1549610425] req@ffff896e74fecb00 x1624787265821632/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549610432 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [93134.630461] Lustre: 21606:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [93211.642137] Lustre: 21606:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549610502/real 1549610502] req@ffff896e74fecb00 x1624787265821632/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549610509 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [93211.669384] Lustre: 21606:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages [93218.679335] LustreError: 21606:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.13.14@o2ib6) failed to reply to blocking AST (req@ffff896e74fecb00 x1624787265821632 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff896e63faaf40/0x223c295f3322b042 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 125 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0x55d07ee457fbc700 expref: 16 pid: 21529 timeout: 93358 lvb_type: 0 [93218.722018] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.13.14@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [93218.734552] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.13.14@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff896e63faaf40/0x223c295f3322b042 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 125 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0x55d07ee457fbc700 expref: 17 pid: 21529 timeout: 0 lvb_type: 0 [93268.840805] Lustre: fir-MDT0002: haven't heard from client e38b2e9b-cf73-357b-ec68-7d3e04ce28b9 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896dcc629400, cur 1549610567 expire 1549610417 last 1549610340 [93268.862514] Lustre: Skipped 8 previous similar messages [93548.571595] Lustre: 21506:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549610839/real 1549610839] req@ffff896de326b300 x1624787266481152/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549610846 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [93548.598852] Lustre: 21506:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [93569.504133] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [93569.513783] Lustre: Skipped 9 previous similar messages [93695.614307] LustreError: 21506:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.13.14@o2ib6) failed to reply to blocking AST (req@ffff896de326b300 x1624787266481152 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8962cbb67980/0x223c295f34390b1c lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 129 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0xd48bfe95c3a7cba0 expref: 16 pid: 21672 timeout: 93835 lvb_type: 0 [93695.656994] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.13.14@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [93695.669532] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.13.14@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8962cbb67980/0x223c295f34390b1c lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 129 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0xd48bfe95c3a7cba0 expref: 17 pid: 21672 timeout: 0 lvb_type: 0 [93789.592356] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [93949.857903] Lustre: fir-MDT0000: haven't heard from client 29b93295-7901-c8e8-73b0-9590371f4106 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961e0310800, cur 1549611248 expire 1549611098 last 1549611021 [93949.879604] Lustre: Skipped 6 previous similar messages [94060.913452] Lustre: 21762:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549611352/real 1549611352] req@ffff89813ca5ef00 x1624787267328656/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549611359 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [94060.940709] Lustre: 21762:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages [94207.957157] LustreError: 21762:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.13.14@o2ib6) failed to reply to blocking AST (req@ffff89813ca5ef00 x1624787267328656 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8975b1fcaf40/0x223c295f35a8ce7c lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 129 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0x187b256db237fcfd expref: 16 pid: 21063 timeout: 94347 lvb_type: 0 [94207.999842] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.13.14@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [94208.012393] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.13.14@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8975b1fcaf40/0x223c295f35a8ce7c lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 129 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0x187b256db237fcfd expref: 17 pid: 21063 timeout: 0 lvb_type: 0 [94243.244917] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [94243.254570] Lustre: Skipped 9 previous similar messages [94545.675796] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [94572.874194] Lustre: fir-MDT0000: haven't heard from client eb065c77-df93-6773-f682-b03b3145b79b (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896ec9a04c00, cur 1549611871 expire 1549611721 last 1549611644 [94572.895894] Lustre: Skipped 7 previous similar messages [94661.571521] Lustre: 21789:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549611952/real 1549611952] req@ffff8980d9291b00 x1624787268138400/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549611959 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [94661.598778] Lustre: 21789:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 24 previous similar messages [94787.614708] LustreError: 21789:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.13.14@o2ib6) failed to reply to blocking AST (req@ffff8980d9291b00 x1624787268138400 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff89814b33d7c0/0x223c295f37165d21 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0xbc8f3c0f52cec60 expref: 16 pid: 21503 timeout: 94927 lvb_type: 0 [94787.657306] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.13.14@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [94787.669854] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.13.14@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff89814b33d7c0/0x223c295f37165d21 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0xbc8f3c0f52cec60 expref: 17 pid: 21503 timeout: 0 lvb_type: 0 [94954.855546] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [94954.865199] Lustre: Skipped 9 previous similar messages [95301.759378] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [95409.895042] Lustre: fir-MDT0000: haven't heard from client d8330a34-757b-e223-e90f-0d0c4f8db623 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980defa4c00, cur 1549612708 expire 1549612558 last 1549612481 [95409.916745] Lustre: Skipped 7 previous similar messages [95559.577301] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [95559.586958] Lustre: Skipped 9 previous similar messages [95770.981365] Lustre: 21703:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549613062/real 1549613062] req@ffff896e02af2100 x1624787269841664/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549613069 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [95771.008621] Lustre: 21703:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages [95918.027074] LustreError: 21703:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.13.14@o2ib6) failed to reply to blocking AST (req@ffff896e02af2100 x1624787269841664 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff89714931c140/0x223c295f3a0bad75 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0x1daf01192b29379a expref: 17 pid: 21730 timeout: 96057 lvb_type: 0 [95918.069793] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.13.14@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [95918.082329] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.13.14@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff89714931c140/0x223c295f3a0bad75 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 130 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0x1daf01192b29379a expref: 18 pid: 21730 timeout: 0 lvb_type: 0 [96057.842890] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [96349.917888] Lustre: fir-MDT0000: haven't heard from client 9fd68817-2c32-1c26-f219-0a50027d0e46 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980d3a7c400, cur 1549613648 expire 1549613498 last 1549613421 [96349.939591] Lustre: Skipped 10 previous similar messages [96387.040448] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [96387.050105] Lustre: Skipped 9 previous similar messages [96813.926324] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [97006.734606] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [97006.744261] Lustre: Skipped 9 previous similar messages [97153.938505] Lustre: fir-MDT0000: haven't heard from client a7f7b397-f533-f7fc-7306-41b846bf62dc (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897049655400, cur 1549614452 expire 1549614302 last 1549614225 [97153.960214] Lustre: Skipped 11 previous similar messages [97570.009769] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [97793.954226] Lustre: fir-MDT0000: haven't heard from client cd48723d-a5d0-fe31-3118-70b967188cc9 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980c06f5000, cur 1549615092 expire 1549614942 last 1549614865 [97793.975948] Lustre: Skipped 8 previous similar messages [97842.580701] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [97842.590356] Lustre: Skipped 12 previous similar messages [98326.094218] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [98427.970089] Lustre: fir-MDT0000: haven't heard from client 860524ba-dffc-de94-346a-f2cce00d49b2 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8970243ab800, cur 1549615726 expire 1549615576 last 1549615499 [98427.992046] Lustre: Skipped 8 previous similar messages [98468.849713] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [98468.859364] Lustre: Skipped 9 previous similar messages [99082.177739] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [99082.187766] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [99082.198031] Lustre: Skipped 8 previous similar messages [99103.997173] Lustre: fir-MDT0002: haven't heard from client bb868ea3-a451-092a-8e8a-b5876bfa4f87 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980be763400, cur 1549616402 expire 1549616252 last 1549616175 [99104.018881] Lustre: Skipped 11 previous similar messages [99797.004401] Lustre: fir-MDT0000: haven't heard from client 6b20a836-6d43-1944-5a04-72caa46fa0d9 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896d82a16c00, cur 1549617095 expire 1549616945 last 1549616868 [99797.026103] Lustre: Skipped 8 previous similar messages [99838.277199] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [99838.287227] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [99838.297492] Lustre: Skipped 12 previous similar messages [100472.039583] Lustre: fir-MDT0000: haven't heard from client d4ed62bf-a160-1b46-956f-77e105934da5 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995ade46400, cur 1549617770 expire 1549617620 last 1549617543 [100472.061387] Lustre: Skipped 11 previous similar messages [100513.441071] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [100513.450808] Lustre: Skipped 12 previous similar messages [100594.376762] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [101271.041503] Lustre: fir-MDT0000: haven't heard from client db60825a-70ed-189e-08b7-fbf3aad6f57b (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89615ff5d800, cur 1549618569 expire 1549618419 last 1549618342 [101271.063291] Lustre: Skipped 8 previous similar messages [101350.461135] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [101350.471253] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [101350.481601] Lustre: Skipped 9 previous similar messages [102010.059981] Lustre: fir-MDT0000: haven't heard from client 6295dd24-b743-5148-9e0c-33fcf1716e35 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896d6faa8800, cur 1549619308 expire 1549619158 last 1549619081 [102010.081769] Lustre: Skipped 5 previous similar messages [102058.717924] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [102058.727661] Lustre: Skipped 6 previous similar messages [102106.560607] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [102686.583875] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [102686.593623] Lustre: Skipped 9 previous similar messages [102726.080860] Lustre: fir-MDT0002: haven't heard from client ee20ee73-1fa1-34d8-be54-7c5825e700c9 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8963596e4c00, cur 1549620024 expire 1549619874 last 1549619797 [102726.102651] Lustre: Skipped 11 previous similar messages [102862.644131] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [103323.956323] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [103323.966066] Lustre: Skipped 9 previous similar messages [103328.093296] Lustre: fir-MDT0000: haven't heard from client cead4c58-7f61-0718-f376-e434ca963035 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898186a16c00, cur 1549620626 expire 1549620476 last 1549620399 [103328.115088] Lustre: Skipped 8 previous similar messages [103618.727717] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [104051.111844] Lustre: MGS: haven't heard from client a9b52a32-863d-638d-547c-6c68982c8918 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89959721cc00, cur 1549621349 expire 1549621199 last 1549621122 [104051.132940] Lustre: Skipped 8 previous similar messages [104104.916737] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [104104.926482] Lustre: Skipped 12 previous similar messages [104374.811292] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [104738.128971] Lustre: fir-MDT0000: haven't heard from client 4ce8d0a0-989a-816d-466e-a14547c2a88e (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896d4ae8d400, cur 1549622036 expire 1549621886 last 1549621809 [104738.150759] Lustre: Skipped 8 previous similar messages [104802.300264] Lustre: MGS: Connection restored to 38f6d14e-7b39-af46-3df0-aa87488916e0 (at 10.8.24.24@o2ib6) [104802.310008] Lustre: Skipped 9 previous similar messages [105130.894841] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [105371.144270] Lustre: fir-MDT0000: haven't heard from client da6468f6-fc36-bbee-3c9a-08c131d0e1c9 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89615ba83c00, cur 1549622669 expire 1549622519 last 1549622442 [105371.166059] Lustre: Skipped 8 previous similar messages [105472.838353] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [105472.848101] Lustre: Skipped 9 previous similar messages [105886.949297] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [105974.159558] Lustre: fir-MDT0002: haven't heard from client a64758d8-a08d-7231-7587-dfa502a2a999 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961456f9800, cur 1549623272 expire 1549623122 last 1549623045 [105974.181351] Lustre: Skipped 11 previous similar messages [106547.875541] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [106547.885285] Lustre: Skipped 15 previous similar messages [106643.033208] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [106852.182698] Lustre: fir-MDT0002: haven't heard from client cd64f71c-5146-a654-de62-6e8815a80882 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896d383a7000, cur 1549624150 expire 1549624000 last 1549623923 [106852.204489] Lustre: Skipped 5 previous similar messages [107399.126834] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [107399.136949] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [107399.147300] Lustre: Skipped 6 previous similar messages [107973.211402] Lustre: fir-MDT0000: haven't heard from client f1f2da0e-310f-09bd-f158-ee35331eec25 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896d30a3b400, cur 1549625271 expire 1549625121 last 1549625044 [107973.233196] Lustre: Skipped 5 previous similar messages [108155.236693] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [108155.246868] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [108155.257235] Lustre: Skipped 3 previous similar messages [108593.427127] Lustre: 20933:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549625290/real 1549625290] req@ffff896d35e85400 x1624787282472624/t0(0) o5->fir-OST0007-osc-MDT0000@10.0.10.102@o2ib7:28/4 lens 432/432 e 14 to 1 dl 1549625891 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [108593.455502] Lustre: 20933:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages [108593.465422] Lustre: fir-OST0007-osc-MDT0000: Connection to fir-OST0007 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [108593.481665] LustreError: 20933:0:(osp_precreate.c:656:osp_precreate_send()) fir-OST0007-osc-MDT0000: can't precreate: rc = -11 [108593.493144] LustreError: 20933:0:(osp_precreate.c:1312:osp_precreate_thread()) fir-OST0007-osc-MDT0000: cannot precreate objects: rc = -11 [108911.346522] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [108911.356650] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [108911.366993] Lustre: Skipped 7 previous similar messages [109061.239104] Lustre: fir-MDT0000: haven't heard from client 7e7613a0-c741-6bc2-0a9b-cc0f8672c911 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995a67bdc00, cur 1549626359 expire 1549626209 last 1549626132 [109061.260897] Lustre: Skipped 5 previous similar messages [109667.456604] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [109667.466719] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [109667.477069] Lustre: Skipped 3 previous similar messages [109689.004615] Lustre: 20933:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549626230/real 1549626230] req@ffff896d9e399800 x1624787284539792/t0(0) o5->fir-OST0007-osc-MDT0000@10.0.10.102@o2ib7:28/4 lens 432/432 e 0 to 1 dl 1549626986 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [109689.032907] Lustre: fir-OST0007-osc-MDT0000: Connection to fir-OST0007 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [109689.049131] LustreError: 20933:0:(osp_precreate.c:656:osp_precreate_send()) fir-OST0007-osc-MDT0000: can't precreate: rc = -11 [109689.060626] LustreError: 20933:0:(osp_precreate.c:1312:osp_precreate_thread()) fir-OST0007-osc-MDT0000: cannot precreate objects: rc = -11 [110047.263570] Lustre: MGS: haven't heard from client 1c2953a3-e3c8-ca87-1724-3de148191764 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899549b57400, cur 1549627345 expire 1549627195 last 1549627118 [110047.284672] Lustre: Skipped 5 previous similar messages [110423.527540] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [110423.537655] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [110423.548001] Lustre: Skipped 7 previous similar messages [110780.279995] Lustre: fir-MDT0000: haven't heard from client e51393bd-8b0f-e1c1-ddb7-19a3f4b2de03 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984b53b8c00, cur 1549628078 expire 1549627928 last 1549627851 [110780.301784] Lustre: Skipped 5 previous similar messages [110832.939326] Lustre: 20538:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549627529/real 1549627529] req@ffff8997bd251b00 x1624787285438192/t0(0) o6->fir-OST0007-osc-MDT0002@10.0.10.102@o2ib7:28/4 lens 544/432 e 4 to 1 dl 1549628130 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [110832.967536] Lustre: fir-OST0007-osc-MDT0002: Connection to fir-OST0007 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [111179.637490] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [111179.647608] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [111179.657954] Lustre: Skipped 7 previous similar messages [111434.299155] Lustre: fir-MDT0002: haven't heard from client 7f600eda-6661-aacb-ba39-8a9b4d161f85 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898112e33400, cur 1549628732 expire 1549628582 last 1549628505 [111434.321015] Lustre: Skipped 2 previous similar messages [111434.490422] Lustre: 20538:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549628130/real 1549628130] req@ffff8997bd251b00 x1624787285438192/t0(0) o6->fir-OST0007-osc-MDT0002@10.0.10.102@o2ib7:28/4 lens 544/432 e 4 to 1 dl 1549628731 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [111434.518630] Lustre: fir-OST0007-osc-MDT0002: Connection to fir-OST0007 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [111589.002302] Lustre: 21144:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549628130/real 1549628130] req@ffff8980feeda700 x1624787285784160/t0(0) o5->fir-OST0007-osc-MDT0002@10.0.10.102@o2ib7:28/4 lens 432/432 e 0 to 1 dl 1549628886 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [111589.030620] LustreError: 21144:0:(osp_precreate.c:940:osp_precreate_cleanup_orphans()) fir-OST0007-osc-MDT0002: cannot cleanup orphans: rc = -107 [111805.219516] Lustre: MGS: Connection restored to 2cd32b21-e88a-1342-1ed4-a9da4a6c9649 (at 10.8.13.14@o2ib6) [111805.229261] Lustre: Skipped 4 previous similar messages [111935.747323] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [112035.561505] Lustre: 20538:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549628732/real 1549628732] req@ffff8997bd251b00 x1624787285438192/t0(0) o6->fir-OST0007-osc-MDT0002@10.0.10.102@o2ib7:28/4 lens 544/432 e 4 to 1 dl 1549629333 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [112035.589743] Lustre: fir-OST0007-osc-MDT0002: Connection to fir-OST0007 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [112239.325377] Lustre: MGS: haven't heard from client b239e3a7-bf96-7c8d-2cc5-b45b892abffd (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997083b3400, cur 1549629537 expire 1549629387 last 1549629310 [112239.346295] Lustre: Skipped 5 previous similar messages [112636.440579] Lustre: 20538:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549629333/real 1549629333] req@ffff8997bd251b00 x1624787285438192/t0(0) o6->fir-OST0007-osc-MDT0002@10.0.10.102@o2ib7:28/4 lens 544/432 e 4 to 1 dl 1549629934 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [112636.468793] Lustre: fir-OST0007-osc-MDT0002: Connection to fir-OST0007 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [112636.485150] Lustre: fir-OST0007-osc-MDT0002: Connection restored to 10.0.10.102@o2ib7 (at 10.0.10.102@o2ib7) [112636.495106] Lustre: Skipped 7 previous similar messages [112691.841086] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [113238.199682] Lustre: 20538:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549629934/real 1549629934] req@ffff8997bd251b00 x1624787285438192/t0(0) o6->fir-OST0007-osc-MDT0002@10.0.10.102@o2ib7:28/4 lens 544/432 e 4 to 1 dl 1549630535 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [113238.227898] Lustre: fir-OST0007-osc-MDT0002: Connection to fir-OST0007 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [113238.244248] Lustre: fir-OST0007-osc-MDT0002: Connection restored to 10.0.10.102@o2ib7 (at 10.0.10.102@o2ib7) [113238.254183] Lustre: Skipped 1 previous similar message [113447.924525] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [114105.371774] Lustre: fir-MDT0002: haven't heard from client b05076ec-3749-5d30-abe1-ce9a780abfe3 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897221e73000, cur 1549631403 expire 1549631253 last 1549631176 [114105.393488] Lustre: Skipped 2 previous similar messages [114203.978609] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [114203.988783] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [114203.999134] Lustre: Skipped 1 previous similar message [114289.368876] Lustre: fir-MDT0000: haven't heard from client 518aa574-d90a-7904-bdaa-63ec4fba6e7d (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977e6ac3c00, cur 1549631587 expire 1549631437 last 1549631360 [114289.390659] Lustre: Skipped 2 previous similar messages [114517.374719] Lustre: fir-MDT0000: haven't heard from client d57a34ad-2851-7c28-60eb-a05a8f83e813 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8962d4fd1800, cur 1549631815 expire 1549631665 last 1549631588 [114517.396433] Lustre: Skipped 2 previous similar messages [114960.087818] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [114960.097928] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [114960.108274] Lustre: Skipped 9 previous similar messages [115716.187182] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [115716.197295] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [116472.281390] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [116472.291502] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [116754.687933] Lustre: 20493:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549633450/real 1549633450] req@ffff896145ee1800 x1624787289318800/t0(0) o6->fir-OST0007-osc-MDT0000@10.0.10.102@o2ib7:28/4 lens 544/432 e 3 to 1 dl 1549634052 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [116754.716201] Lustre: fir-OST0007-osc-MDT0000: Connection to fir-OST0007 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [117228.375412] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [117228.385525] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [117228.395872] Lustre: Skipped 1 previous similar message [117984.484123] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [117984.494235] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [118740.588021] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [118740.598139] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [119496.653198] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [119496.663321] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [120252.757602] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [120252.767717] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [121008.823123] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [121008.833233] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [121764.927642] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [121764.937758] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [122070.563121] Lustre: MGS: haven't heard from client 58c319e2-128b-6d7f-bc22-e6642c995c3b (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8967d9329c00, cur 1549639368 expire 1549639218 last 1549639141 [122070.584137] Lustre: Skipped 2 previous similar messages [122521.032130] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [122521.042247] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [123277.126004] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [123277.136120] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [124033.191371] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [124033.201480] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [124033.211824] Lustre: Skipped 3 previous similar messages [124789.300891] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [124789.310997] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [125545.405301] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [125545.415416] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [126301.469923] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [126301.480033] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [126788.020707] LustreError: 20641:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.18.28@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8972956c1440/0x223c295f5d0269ef lrc: 3/0,0 mode: PW/PW res: [0x2c0003bbf:0x3:0x0].0x0 bits 0x40/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.8.18.28@o2ib6 remote: 0x702710f2ca5dbb1e expref: 30 pid: 21736 timeout: 126784 lvb_type: 0 [127016.698904] Lustre: fir-MDT0002: haven't heard from client 43c491ee-b68b-5359-c63e-5195c978bbc4 (at 10.8.18.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980ef665c00, cur 1549644314 expire 1549644164 last 1549644087 [127016.720701] Lustre: Skipped 2 previous similar messages [127057.574502] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [127057.584625] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [127057.594968] Lustre: Skipped 1 previous similar message [127813.683907] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [127813.694029] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [127813.704374] Lustre: Skipped 1 previous similar message [128098.748967] Lustre: fir-MDT0000: haven't heard from client 638957ff-0257-7fa1-358e-65225566887c (at 10.8.3.30@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977e6044400, cur 1549645396 expire 1549645246 last 1549645169 [128174.720736] Lustre: fir-MDT0000: haven't heard from client 5477c7e2-64cd-994a-9e68-58acd7f57ef5 (at 10.8.6.13@o2ib6) in 158 seconds. I think it's dead, and I am evicting it. exp ffff8977e6bb3c00, cur 1549645472 expire 1549645322 last 1549645314 [128174.742448] Lustre: Skipped 2 previous similar messages [128243.730659] Lustre: MGS: haven't heard from client c9b1e3c1-f14b-e6bf-fc85-c4b5d867823b (at 10.8.6.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8967d8d0f400, cur 1549645541 expire 1549645391 last 1549645314 [128415.932094] Lustre: MGS: Connection restored to c9b1e3c1-f14b-e6bf-fc85-c4b5d867823b (at 10.8.6.13@o2ib6) [128415.941753] Lustre: Skipped 3 previous similar messages [128569.793177] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [129325.847471] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [129325.857587] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [129325.867933] Lustre: Skipped 3 previous similar messages [130081.956547] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [130081.966655] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [130081.977001] Lustre: Skipped 3 previous similar messages [130374.933718] Lustre: 20538:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549647665/real 1549647665] req@ffff8997f83de600 x1624787298201184/t0(0) o13->fir-OST0001-osc-MDT0000@10.0.10.102@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1549647672 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [130374.961929] Lustre: fir-OST0001-osc-MDT0000: Connection to fir-OST0001 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [130376.213749] Lustre: 20523:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549647666/real 1549647666] req@ffff898611b3b300 x1624787298201376/t0(0) o13->fir-OST0005-osc-MDT0000@10.0.10.102@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1549647673 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [130376.241959] Lustre: 20523:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [130376.251735] Lustre: fir-OST0005-osc-MDT0000: Connection to fir-OST0005 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [130376.267907] Lustre: Skipped 1 previous similar message [130376.329749] LNetError: 20470:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds [130376.339835] LNetError: 20470:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.102@o2ib7 (6): c: 0, oc: 0, rc: 8 [130377.329778] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 1 seconds [130377.685787] Lustre: 20530:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549647667/real 1549647667] req@ffff899632e8cb00 x1624787298201696/t0(0) o13->fir-OST0003-osc-MDT0000@10.0.10.102@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1549647674 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [130377.713988] Lustre: 20530:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages [130377.723823] Lustre: fir-OST0003-osc-MDT0000: Connection to fir-OST0003 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [130377.739997] Lustre: Skipped 7 previous similar messages [130382.329903] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 0 seconds [130382.340158] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 3 previous similar messages [130384.329947] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 1 seconds [130384.340198] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 2 previous similar messages [130429.331082] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 18 seconds [130429.341430] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 1 previous similar message [130434.331205] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 0 seconds [130434.341460] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 8 previous similar messages [130480.332358] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 2 seconds [130480.342613] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 2 previous similar messages [130529.333588] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 18 seconds [130529.343934] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 11 previous similar messages [130572.781299] Lustre: fir-MDT0000: haven't heard from client fir-MDT0000-lwp-OST0009_UUID (at 10.0.10.102@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997ecbce000, cur 1549647870 expire 1549647720 last 1549647643 [130572.802486] Lustre: Skipped 1 previous similar message [130581.334891] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 19 seconds [130581.345236] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 11 previous similar messages [130680.337380] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 16 seconds [130680.347728] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 23 previous similar messages [130831.341175] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.102@o2ib7: 1 seconds [130831.351431] LNet: 20470:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 35 previous similar messages [130838.055520] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [130838.065637] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [131594.149330] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [131594.159445] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [131594.169795] Lustre: Skipped 16 previous similar messages [132037.832996] Lustre: fir-MDT0002: haven't heard from client 32eb5e08-1462-3916-9de2-5474e19b28d7 (at 10.8.4.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8987bf09cc00, cur 1549649335 expire 1549649185 last 1549649108 [132037.854703] Lustre: Skipped 12 previous similar messages [132227.266578] Lustre: MGS: Connection restored to f5e251fd-41ed-44a2-3e79-229973fd6239 (at 10.8.4.35@o2ib6) [132227.276237] Lustre: Skipped 15 previous similar messages [132350.257912] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [132354.566320] Lustre: Failing over fir-MDT0000 [132354.570686] Lustre: Skipped 1 previous similar message [132354.580089] LustreError: 21344:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.18.28@o2ib6 arrived at 1549649651 with bad export cookie 2466892185454478922 [132354.595641] LustreError: 21344:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 10 previous similar messages [132354.596466] Lustre: fir-MDT0002: Not available for connect from 10.8.7.13@o2ib6 (stopping) [132354.596468] Lustre: Skipped 7 previous similar messages [132354.607145] Lustre: 21530:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (52924:78637s); client may timeout. req@ffff8977e2c47200 x1624748261185216/t25769804206(0) o36->bcdbcaa2-b5a0-6ff6-1390-a90accf35015@10.9.106.20@o2ib4:635/0 lens 504/424 e 0 to 0 dl 1549571014 ref 1 fl Complete:/0/0 rc -19/-19 [132354.607148] Lustre: 21530:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message [132354.607256] LNet: Service thread pid 21530 completed after 131561.32s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [132354.619338] LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_disconnect to node 0@lo failed: rc = -19 [132354.619350] LustreError: 53268:0:(osp_dev.c:485:osp_disconnect()) fir-MDT0000-osp-MDT0002: can't disconnect: rc = -19 [132354.623488] LustreError: 53268:0:(lod_dev.c:265:lod_sub_process_config()) fir-MDT0002-mdtlov: error cleaning up LOD index 0: cmd 0xcf031: rc = -19 [132355.101576] Lustre: fir-MDT0002: Not available for connect from 10.8.6.1@o2ib6 (stopping) [132355.109849] Lustre: Skipped 133 previous similar messages [132355.168703] LustreError: 20797:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8977e66ae800 ns: mdt-fir-MDT0000_UUID lock: ffff8973fde957c0/0x223c295f5fecb86f lrc: 3/0,0 mode: PW/PW res: [0x2000018c1:0x8cb:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x50200000000000 nid: 10.8.26.15@o2ib6 remote: 0xd69ae504738c144f expref: 8862 pid: 20797 timeout: 0 lvb_type: 0 [132355.454454] LustreError: 21344:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.3.34@o2ib6 arrived at 1549649652 with bad export cookie 2466892173948357183 [132356.103762] Lustre: fir-MDT0000: Not available for connect from 10.9.107.8@o2ib4 (stopping) [132356.112205] Lustre: Skipped 161 previous similar messages [132357.151578] LustreError: 20529:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff89962033b600 x1624787302044240/t0(0) o41->fir-MDT0001-osp-MDT0002@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [132357.173556] LustreError: 20529:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages [132357.191961] LustreError: 20933:0:(osp_precreate.c:656:osp_precreate_send()) fir-OST0007-osc-MDT0000: can't precreate: rc = -5 [132357.203366] LustreError: 20933:0:(osp_precreate.c:1312:osp_precreate_thread()) fir-OST0007-osc-MDT0000: cannot precreate objects: rc = -5 [132358.163932] Lustre: fir-MDT0002: Not available for connect from 10.8.4.21@o2ib6 (stopping) [132358.172285] Lustre: Skipped 263 previous similar messages [132358.218108] LustreError: 21352:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.9.9@o2ib6 arrived at 1549649655 with bad export cookie 2466892173948338486 [132358.447512] LustreError: 20506:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8961ae310c00 x1624787302045984/t0(0) o41->fir-MDT0003-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [132361.071611] LustreError: 30466:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.11.28@o2ib6 arrived at 1549649658 with bad export cookie 2466892173948344919 [132361.087166] LustreError: 30466:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 3 previous similar messages [132362.168524] Lustre: fir-MDT0000: Not available for connect from 10.8.17.20@o2ib6 (stopping) [132362.176979] Lustre: Skipped 405 previous similar messages [132362.515665] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.113.6@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [132362.533028] LustreError: Skipped 1 previous similar message [132362.967204] Lustre: server umount fir-MDT0000 complete [132363.050752] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.7.26@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [132363.068034] LustreError: Skipped 39 previous similar messages [132364.059305] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.22.8@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [132364.076583] LustreError: Skipped 44 previous similar messages [132364.298707] Lustre: server umount fir-MDT0002 complete [132366.726988] LustreError: 21344:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.106.46@o2ib4 arrived at 1549649663 with bad export cookie 2466892173948351772 [132366.742625] LustreError: 21344:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 4 previous similar messages [132375.156979] LustreError: 21352:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.101.4@o2ib4 arrived at 1549649672 with bad export cookie 2466892173948350113 [132375.172530] LustreError: 21352:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 3 previous similar messages [132437.100460] Lustre: 53491:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549649728/real 1549649728] req@ffff8995c5ebf200 x1624787302047440/t0(0) o251->MGC10.0.10.51@o2ib7@0@lo:26/25 lens 224/224 e 0 to 1 dl 1549649734 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [132437.127535] Lustre: 53491:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [132437.620277] Lustre: server umount MGS complete [132438.044216] LNetError: 7332:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.202@o2ib7 on NA (ib0:1:10.0.10.51): bad dst nid 10.0.10.51@o2ib7 [132438.059508] LNetError: 7332:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Skipped 1 previous similar message [132438.846360] LNetError: 7332:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.212@o2ib7 on NA (ib0:1:10.0.10.51): bad dst nid 10.0.10.51@o2ib7 [132438.861667] LNetError: 7332:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Skipped 15 previous similar messages [132440.039533] LNet: Removed LNI 10.0.10.51@o2ib7 [132456.504885] LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 [132456.512791] alg: No test for adler32 (adler32-zlib) [132457.348282] Lustre: Lustre: Build Version: 2.12.0 [132457.491749] LNet: Using FastReg for registration [132457.508450] LNet: Added LNI 10.0.10.51@o2ib7 [8/256/0/180] [132458.700322] LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [132459.024057] Lustre: MGS: Connection restored to 88157be5-2bb6-9cd2-0dde-ef49668882b4 (at 0@lo) [132459.032757] Lustre: Skipped 1 previous similar message [132459.534176] Lustre: MGS: Connection restored to 4791ab67-b098-1c8e-2f57-bf7b0dfb741f (at 10.9.106.26@o2ib4) [132459.544002] Lustre: Skipped 21 previous similar messages [132460.560056] Lustre: MGS: Connection restored to 6e4a5cb4-2ac7-93d0-3f4e-e1020d18c9a5 (at 10.8.11.35@o2ib6) [132460.569799] Lustre: Skipped 30 previous similar messages [132462.578036] Lustre: MGS: Connection restored to 66c2bd6c-9298-b4a0-e75b-2dc006ca2b27 (at 10.8.30.4@o2ib6) [132462.587686] Lustre: Skipped 87 previous similar messages [132466.609709] Lustre: MGS: Connection restored to (at 10.9.102.29@o2ib4) [132466.616613] Lustre: Skipped 84 previous similar messages [132475.329894] Lustre: MGS: Connection restored to 322a9602-ef78-4ec1-1793-487b5f049ad5 (at 10.0.10.101@o2ib7) [132475.339728] Lustre: Skipped 23 previous similar messages [132491.331218] Lustre: MGS: Connection restored to dc20a1dd-5505-3c43-807e-27c81934005a (at 10.8.30.11@o2ib6) [132491.340963] Lustre: Skipped 120 previous similar messages [132526.760239] Lustre: MGS: Connection restored to bfed2ed8-6a47-ad76-2200-365cb56f5c95 (at 10.9.106.52@o2ib4) [132526.770065] Lustre: Skipped 984 previous similar messages [132535.201054] LDISKFS-fs (dm-4): file extents enabled [132535.204022] LDISKFS-fs (dm-0): file extents enabled [132535.204023] , maximum tree depth=5 [132535.212755] , maximum tree depth=5 [132535.401376] LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [132535.402121] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [132535.850105] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.1.5@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [132535.867306] LustreError: Skipped 2 previous similar messages [132535.910996] Lustre: fir-MDT0002: Not available for connect from 10.8.3.10@o2ib6 (not set up) [132536.358516] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.105.39@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [132536.375977] LustreError: Skipped 27 previous similar messages [132537.381854] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.102.13@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [132537.399312] LustreError: Skipped 49 previous similar messages [132538.007720] LustreError: 11-0: fir-OST0027-osc-MDT0002: operation ost_connect to node 10.0.10.108@o2ib7 failed: rc = -16 [132538.065643] Lustre: fir-MDT0002: Imperative Recovery not enabled, recovery window 300-900 [132538.142332] Lustre: fir-MDD0002: changelog on [132538.152792] Lustre: fir-MDT0002: in recovery but waiting for the first client to connect [132538.153063] Lustre: fir-MDT0002: Will be in recovery for at least 5:00, or until 1354 clients reconnect [132538.408738] Lustre: fir-MDT0000: Not available for connect from 10.9.107.7@o2ib4 (not set up) [132538.417348] Lustre: Skipped 9 previous similar messages [132538.592049] LustreError: 11-0: fir-MDT0002-osp-MDT0000: operation mds_connect to node 0@lo failed: rc = -114 [132538.601960] LustreError: Skipped 89 previous similar messages [132538.670396] Lustre: fir-MDT0000: Imperative Recovery not enabled, recovery window 300-900 [132538.793418] Lustre: fir-MDD0000: changelog on [132538.802372] Lustre: fir-MDT0000: in recovery but waiting for the first client to connect [132538.812481] Lustre: fir-MDT0000: Will be in recovery for at least 5:00, or until 1352 clients reconnect [132539.396800] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.103.22@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [132539.414275] LustreError: Skipped 154 previous similar messages [132543.445336] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.108@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [132543.462820] LustreError: Skipped 197 previous similar messages [132552.355684] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.101@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [132552.373166] LustreError: Skipped 82 previous similar messages [132563.748973] LustreError: 11-0: fir-OST000e-osc-MDT0002: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 [132563.759942] LustreError: Skipped 90 previous similar messages [132588.837683] LustreError: 11-0: fir-OST000a-osc-MDT0002: operation ost_connect to node 10.0.10.101@o2ib7 failed: rc = -16 [132588.848689] LustreError: Skipped 89 previous similar messages [132613.926125] LustreError: 11-0: fir-OST0027-osc-MDT0002: operation ost_connect to node 10.0.10.108@o2ib7 failed: rc = -16 [132613.937117] LustreError: Skipped 89 previous similar messages [132639.014905] LustreError: 11-0: fir-OST000e-osc-MDT0002: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 [132639.025865] LustreError: Skipped 81 previous similar messages [132645.299856] Lustre: fir-MDT0002: Connection restored to 0d285bc8-b45e-6411-8b46-4976b63d3fe4 (at 10.9.104.71@o2ib4) [132645.310384] Lustre: Skipped 2805 previous similar messages [132645.404792] Lustre: fir-MDT0000: Recovery over after 1:47, of 1354 clients 1354 recovered and 0 were evicted. [132645.728924] Lustre: fir-MDT0002: Recovery already passed deadline 1:47, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. [132664.103355] LustreError: 11-0: fir-OST0027-osc-MDT0002: operation ost_connect to node 10.0.10.108@o2ib7 failed: rc = -16 [132664.114339] LustreError: Skipped 77 previous similar messages [132714.280614] LustreError: 11-0: fir-OST0027-osc-MDT0002: operation ost_connect to node 10.0.10.108@o2ib7 failed: rc = -16 [132714.291568] LustreError: Skipped 151 previous similar messages [132789.546505] LustreError: 11-0: fir-OST002e-osc-MDT0002: operation ost_connect to node 10.0.10.107@o2ib7 failed: rc = -16 [132789.557458] LustreError: Skipped 227 previous similar messages [132974.783721] Lustre: 54210:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [133172.842070] Lustre: fir-MDT0000: haven't heard from client 14667fe1-47d5-8e19-3c0d-62eba7fdba91 (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8988a941d800, cur 1549650470 expire 1549650320 last 1549650243 [133203.671404] Lustre: MGS: Connection restored to f11f601d-401d-0aef-2f13-c25318014c3d (at 10.8.17.17@o2ib6) [133203.681145] Lustre: Skipped 7 previous similar messages [134478.263346] Lustre: MGS: Connection restored to 8261ca39-3eb8-d991-8a10-16b772d33645 (at 10.8.1.19@o2ib6) [134478.273001] Lustre: Skipped 5 previous similar messages [134801.500796] Lustre: 53951:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549652091/real 1549652091] req@ffff8996cc38b000 x1624925562812480/t0(0) o13->fir-OST0008-osc-MDT0000@10.0.10.101@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1549652098 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [134801.529014] Lustre: fir-OST0008-osc-MDT0000: Connection to fir-OST0008 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [134801.545168] Lustre: Skipped 1 previous similar message [134802.852838] Lustre: 53912:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549652092/real 1549652092] req@ffff896390076000 x1624925562812736/t0(0) o13->fir-OST0008-osc-MDT0002@10.0.10.101@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1549652099 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [134802.868838] Lustre: fir-OST0006-osc-MDT0002: Connection to fir-OST0006 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [134802.897200] Lustre: 53912:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [134803.549846] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds [134803.559928] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.101@o2ib7 (6): c: 0, oc: 0, rc: 8 [134803.892860] Lustre: 53924:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549652094/real 1549652094] req@ffff89740bfdcb00 x1624925562813280/t0(0) o13->fir-OST0019-osc-MDT0000@10.0.10.106@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1549652101 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [134803.921065] Lustre: 53924:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages [134803.924860] Lustre: fir-OST001f-osc-MDT0000: Connection to fir-OST001f (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [134803.924862] Lustre: Skipped 9 previous similar messages [134805.550894] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 2 seconds [134806.549926] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.106@o2ib7: 3 seconds [134806.560183] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 7 previous similar messages [134807.549951] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.106@o2ib7: 0 seconds [134807.560203] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 2 previous similar messages [134856.551183] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 47 seconds [134906.552433] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 40 seconds [134906.562779] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 21 previous similar messages [134956.553685] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 40 seconds [134956.564027] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 21 previous similar messages [135006.889948] Lustre: fir-MDT0002: haven't heard from client fir-MDT0002-lwp-OST001f_UUID (at 10.0.10.106@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8981287bb400, cur 1549652304 expire 1549652154 last 1549652077 [135006.911128] Lustre: Skipped 2 previous similar messages [135007.554965] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 41 seconds [135007.565312] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 21 previous similar messages [135056.556194] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 45 seconds [135056.566539] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 20 previous similar messages [135157.558741] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 146 seconds [135157.569170] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 47 previous similar messages [135300.479900] Lustre: MGS: Connection restored to 9f8dd812-abf7-6ee9-ef7c-653281296d9c (at 10.8.24.20@o2ib6) [135300.489640] Lustre: Skipped 2 previous similar messages [135307.562496] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 42 seconds [135307.572839] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 71 previous similar messages [135563.568932] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 0 seconds [135563.579189] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 142 previous similar messages [135934.008687] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [136110.582643] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 40 seconds [136110.592990] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 240 previous similar messages [136575.130271] Lustre: fir-MDT0000: Connection restored to 9f8dd812-abf7-6ee9-ef7c-653281296d9c (at 10.8.24.20@o2ib6) [136575.140711] Lustre: Skipped 1 previous similar message [136583.399644] Lustre: MGS: Connection restored to 10.0.10.106@o2ib7 (at 10.0.10.106@o2ib7) [136584.922596] Lustre: fir-MDT0002: Connection restored to 10.0.10.106@o2ib7 (at 10.0.10.106@o2ib7) [136584.931491] Lustre: Skipped 1 previous similar message [136589.303423] Lustre: MGS: Connection restored to 10.0.10.101@o2ib7 (at 10.0.10.101@o2ib7) [136589.311608] Lustre: Skipped 10 previous similar messages [136608.718625] Lustre: fir-MDT0000: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [136608.729062] Lustre: Skipped 13 previous similar messages [136736.398491] Lustre: fir-OST001f-osc-MDT0000: Connection restored to 10.0.10.106@o2ib7 (at 10.0.10.106@o2ib7) [136752.735042] Lustre: fir-OST0023-osc-MDT0000: Connection restored to 10.0.10.106@o2ib7 (at 10.0.10.106@o2ib7) [136752.744953] Lustre: Skipped 17 previous similar messages [136871.634332] Lustre: 56074:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [136877.646238] Lustre: 54696:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [136877.657994] Lustre: 54696:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages [136879.757428] Lustre: 55386:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [136879.769188] Lustre: 55386:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages [136883.629639] Lustre: 54705:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [136883.641374] Lustre: 54705:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages [136887.967173] Lustre: 54703:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [136887.978930] Lustre: 54703:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 71 previous similar messages [136896.812529] Lustre: 54705:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [136896.824267] Lustre: 54705:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 111 previous similar messages [136912.813839] Lustre: 54719:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [136912.825596] Lustre: 54719:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 308 previous similar messages [136946.021731] Lustre: 55382:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [136946.033473] Lustre: 55382:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 480 previous similar messages [136988.937847] Lustre: fir-MDT0000: haven't heard from client 6f635f45-ae2a-0aad-843c-d6486afb74d2 (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997032cac00, cur 1549654286 expire 1549654136 last 1549654059 [136988.959648] Lustre: Skipped 25 previous similar messages [137156.041155] Lustre: MGS: Connection restored to fd9ea46b-f23a-ba54-3c15-694997be6405 (at 10.8.10.33@o2ib6) [137156.050901] Lustre: Skipped 5 previous similar messages [137207.943954] Lustre: fir-MDT0000: haven't heard from client eda8c254-fde6-5eef-edaa-9ca909eae6a7 (at 10.8.21.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997e9382c00, cur 1549654505 expire 1549654355 last 1549654278 [137207.965746] Lustre: Skipped 2 previous similar messages [137533.179960] Lustre: MGS: Connection restored to 16664fb3-beb0-e49a-d4b6-262264fe49f5 (at 10.8.21.19@o2ib6) [137533.189700] Lustre: Skipped 2 previous similar messages [138090.965385] Lustre: fir-MDT0000: haven't heard from client 7017ed88-5010-e10e-c1f4-85540f0670b6 (at 10.8.10.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997fa216800, cur 1549655388 expire 1549655238 last 1549655161 [138090.987179] Lustre: Skipped 2 previous similar messages [138172.736402] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549655462/real 1549655462] req@ffff8982cee0da00 x1624925580118144/t0(0) o106->fir-MDT0000@10.8.10.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549655469 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [138172.763744] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages [138179.773573] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549655469/real 1549655469] req@ffff8982cee0da00 x1624925580118144/t0(0) o106->fir-MDT0000@10.8.10.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549655476 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [138186.800751] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549655476/real 1549655476] req@ffff8982cee0da00 x1624925580118144/t0(0) o106->fir-MDT0000@10.8.10.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549655483 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [138193.827926] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549655483/real 1549655483] req@ffff8982cee0da00 x1624925580118144/t0(0) o106->fir-MDT0000@10.8.10.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549655490 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [138200.855103] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549655490/real 1549655490] req@ffff8982cee0da00 x1624925580118144/t0(0) o106->fir-MDT0000@10.8.10.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549655497 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [138214.882448] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549655504/real 1549655504] req@ffff8982cee0da00 x1624925580118144/t0(0) o106->fir-MDT0000@10.8.10.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549655511 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [138214.909812] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [138216.607551] Lustre: MGS: Connection restored to f66cd5e9-589a-13f8-79f0-29d09dd327e1 (at 10.8.10.36@o2ib6) [138216.617292] Lustre: Skipped 2 previous similar messages [138235.919981] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549655525/real 1549655525] req@ffff8982cee0da00 x1624925580118144/t0(0) o106->fir-MDT0000@10.8.10.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549655532 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [138235.947322] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [138270.957861] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549655560/real 1549655560] req@ffff8982cee0da00 x1624925580118144/t0(0) o106->fir-MDT0000@10.8.10.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549655567 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [138270.985206] Lustre: 54214:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [138273.970169] Lustre: fir-MDT0002: haven't heard from client b4e607c7-6408-22c3-e114-9f5caf76c169 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961986f2400, cur 1549655571 expire 1549655421 last 1549655344 [138273.991895] Lustre: Skipped 2 previous similar messages [138290.970418] Lustre: fir-MDT0000: haven't heard from client b4e607c7-6408-22c3-e114-9f5caf76c169 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997f67b7800, cur 1549655588 expire 1549655438 last 1549655361 [138290.992127] Lustre: Skipped 1 previous similar message [138293.985017] Lustre: MGS: haven't heard from client df217bff-deec-715e-3f90-260fc86b0960 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89783fec6000, cur 1549655591 expire 1549655441 last 1549655364 [138294.006041] Lustre: Skipped 1 previous similar message [138346.523770] Lustre: 56099:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549655636/real 1549655636] req@ffff896e9a6fce00 x1624925581209792/t0(0) o106->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549655643 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [138346.551109] Lustre: 56099:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [138349.974792] Lustre: fir-MDT0002: haven't heard from client baf8fd69-3c06-48ef-44c1-991a98b1784d (at 10.8.13.14@o2ib6) in 189 seconds. I think it's dead, and I am evicting it. exp ffff896163a7d800, cur 1549655647 expire 1549655497 last 1549655458 [138349.996596] Lustre: Skipped 1 previous similar message [138366.972720] Lustre: fir-MDT0000: haven't heard from client 2336058a-aa9c-4463-bac9-a8ea66369e87 (at 10.8.11.22@o2ib6) in 203 seconds. I think it's dead, and I am evicting it. exp ffff8997f2b00800, cur 1549655664 expire 1549655514 last 1549655461 [138366.994508] Lustre: Skipped 1 previous similar message [138547.658379] Lustre: MGS: Connection restored to a97d368a-6cfe-bdb4-3274-54d93aaf3f16 (at 10.8.13.14@o2ib6) [138547.668122] Lustre: Skipped 5 previous similar messages [138882.544463] Lustre: Modifying parameter osc.fir-OST*.osc.max_pages_per_rpc in log params [139361.867245] Lustre: 56070:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549656651/real 1549656651] req@ffff8964eff11e00 x1624925598388672/t0(0) o104->fir-MDT0000@10.8.11.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549656658 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [139361.894594] Lustre: 56070:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [139382.904781] Lustre: 56070:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549656672/real 1549656672] req@ffff8964eff11e00 x1624925598388672/t0(0) o104->fir-MDT0000@10.8.11.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549656679 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [139382.932124] Lustre: 56070:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [139417.943642] Lustre: 56070:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549656707/real 1549656707] req@ffff8964eff11e00 x1624925598388672/t0(0) o104->fir-MDT0000@10.8.11.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549656714 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [139417.971005] Lustre: 56070:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [139487.982404] Lustre: 56070:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549656777/real 1549656777] req@ffff8964eff11e00 x1624925598388672/t0(0) o104->fir-MDT0000@10.8.11.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549656784 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [139488.009764] Lustre: 56070:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [139509.020967] LustreError: 56070:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.11.22@o2ib6) failed to reply to blocking AST (req@ffff8964eff11e00 x1624925598388672 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff89753fb03cc0/0x1a4b7ac35c1c238d lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 78 type: IBT flags: 0x60200400000020 nid: 10.8.11.22@o2ib6 remote: 0xc6ba24065511b15e expref: 16 pid: 54713 timeout: 139647 lvb_type: 0 [139509.063740] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.11.22@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [139509.076357] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.11.22@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff89753fb03cc0/0x1a4b7ac35c1c238d lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 78 type: IBT flags: 0x60200400000020 nid: 10.8.11.22@o2ib6 remote: 0xc6ba24065511b15e expref: 17 pid: 54713 timeout: 0 lvb_type: 0 [139567.003309] Lustre: fir-MDT0002: haven't heard from client 843044c5-b529-765e-1bbc-6729c498b3a5 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89706439b800, cur 1549656864 expire 1549656714 last 1549656637 [139567.025098] Lustre: Skipped 3 previous similar messages [139618.639743] Lustre: MGS: Connection restored to 4f488fb9-6b99-0046-82a0-51dc85b62ae9 (at 10.8.11.22@o2ib6) [139618.649483] Lustre: Skipped 8 previous similar messages [139947.011952] Lustre: fir-MDT0002: haven't heard from client 647d49b6-6175-c5f2-9b13-ecaf32f220d3 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977cbeedc00, cur 1549657244 expire 1549657094 last 1549657017 [139947.033743] Lustre: Skipped 1 previous similar message [140303.020881] Lustre: fir-MDT0002: haven't heard from client 70ad2d31-6c1d-f63a-19fa-86804a246997 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89717525a800, cur 1549657600 expire 1549657450 last 1549657373 [140303.042689] Lustre: Skipped 5 previous similar messages [140385.915793] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [140385.925532] Lustre: Skipped 8 previous similar messages [140558.027311] Lustre: fir-MDT0000: haven't heard from client 8bc0b59d-4176-29ca-8993-ccbed2d8cd47 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89820c7c1c00, cur 1549657855 expire 1549657705 last 1549657628 [140558.049134] Lustre: Skipped 2 previous similar messages [140778.033918] Lustre: fir-MDT0002: haven't heard from client 52df0e9e-984a-2f80-9cdb-0e81aebea170 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89807b7ed800, cur 1549658075 expire 1549657925 last 1549657848 [140778.055712] Lustre: Skipped 5 previous similar messages [140854.034779] Lustre: fir-MDT0000: haven't heard from client b846950b-5bea-339d-e553-c5a9af56a183 (at 10.8.18.35@o2ib6) in 208 seconds. I think it's dead, and I am evicting it. exp ffff8982a6fec000, cur 1549658151 expire 1549658001 last 1549657943 [140854.056597] Lustre: Skipped 2 previous similar messages [140930.036603] Lustre: fir-MDT0000: haven't heard from client 73bf6493-f7ad-5630-e388-f857edbadfbd (at 10.8.13.14@o2ib6) in 164 seconds. I think it's dead, and I am evicting it. exp ffff8971f236e800, cur 1549658227 expire 1549658077 last 1549658063 [140930.058392] Lustre: Skipped 2 previous similar messages [141120.043561] Lustre: MGS: Connection restored to a97d368a-6cfe-bdb4-3274-54d93aaf3f16 (at 10.8.13.14@o2ib6) [141120.053297] Lustre: Skipped 17 previous similar messages [141256.045876] Lustre: fir-MDT0000: haven't heard from client 0baed2ab-c9d8-fd86-6a1d-b744b9d0df57 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897666281000, cur 1549658553 expire 1549658403 last 1549658326 [141256.067664] Lustre: Skipped 2 previous similar messages [141598.053440] Lustre: fir-MDT0000: haven't heard from client 2bc95971-57af-d09b-f861-ded15d800fe7 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8965e9a2b000, cur 1549658895 expire 1549658745 last 1549658668 [141598.075244] Lustre: Skipped 5 previous similar messages [141890.845321] Lustre: MGS: Connection restored to a97d368a-6cfe-bdb4-3274-54d93aaf3f16 (at 10.8.13.14@o2ib6) [141890.855061] Lustre: Skipped 11 previous similar messages [142164.803579] Lustre: 54215:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549659454/real 1549659454] req@ffff898084bd0000 x1624925608126864/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549659461 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [142164.830925] Lustre: 54215:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [142170.073329] Lustre: MGS: haven't heard from client b07e67b3-c79e-02e1-5950-7cf281891dc0 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89978beec800, cur 1549659467 expire 1549659317 last 1549659240 [142170.094421] Lustre: Skipped 8 previous similar messages [142185.842113] Lustre: 54215:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549659475/real 1549659475] req@ffff898084bd0000 x1624925608126864/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549659482 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [142185.869469] Lustre: 54215:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [142220.879984] Lustre: 54215:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549659510/real 1549659510] req@ffff898084bd0000 x1624925608126864/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549659517 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [142220.907328] Lustre: 54215:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [142290.919752] Lustre: 54215:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549659580/real 1549659580] req@ffff898084bd0000 x1624925608126864/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549659587 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [142290.947092] Lustre: 54215:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [142675.616044] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [142675.625790] Lustre: Skipped 11 previous similar messages [142841.085306] Lustre: fir-MDT0000: haven't heard from client 2d41560b-ed16-c5ed-243a-4772c3236b9b (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897656e9dc00, cur 1549660138 expire 1549659988 last 1549659911 [142841.107093] Lustre: Skipped 8 previous similar messages [143425.907127] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [143425.916868] Lustre: Skipped 8 previous similar messages [143466.842254] Lustre: 54714:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549660756/real 1549660756] req@ffff89661ffbdd00 x1624925650937968/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549660763 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [143487.869782] Lustre: 54714:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549660777/real 1549660777] req@ffff89661ffbdd00 x1624925650937968/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549660784 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [143487.897344] Lustre: 54714:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [143522.907657] Lustre: 54714:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549660812/real 1549660812] req@ffff89661ffbdd00 x1624925650937968/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549660819 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [143522.935021] Lustre: 54714:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [143547.105706] Lustre: fir-MDT0002: haven't heard from client 5c7ff6bb-6ea2-5c1f-a993-db1bea2df0d0 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89768a675400, cur 1549660844 expire 1549660694 last 1549660617 [143547.127501] Lustre: Skipped 8 previous similar messages [144212.616517] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [144212.626258] Lustre: Skipped 8 previous similar messages [144542.127600] Lustre: fir-MDT0000: haven't heard from client daf13793-62c8-b0f4-d2a4-5d7550b6b47f (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89843dab4400, cur 1549661839 expire 1549661689 last 1549661612 [144542.149391] Lustre: Skipped 8 previous similar messages [144889.994479] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [144890.004220] Lustre: Skipped 5 previous similar messages [145169.143128] Lustre: fir-MDT0000: haven't heard from client 57b224bb-b0ae-f353-dccc-dcc107baacf5 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898680b97000, cur 1549662466 expire 1549662316 last 1549662239 [145169.164934] Lustre: Skipped 5 previous similar messages [145730.313086] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [145730.322826] Lustre: Skipped 5 previous similar messages [146108.167425] Lustre: fir-MDT0000: haven't heard from client 82a36d0f-231c-d7d8-dcbb-666c11eaac0e (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8971ba2ba400, cur 1549663405 expire 1549663255 last 1549663178 [146108.189227] Lustre: Skipped 8 previous similar messages [146412.131461] Lustre: MGS: Connection restored to df217bff-deec-715e-3f90-260fc86b0960 (at 10.8.11.9@o2ib6) [146412.141114] Lustre: Skipped 8 previous similar messages [146805.184140] Lustre: fir-MDT0002: haven't heard from client 9a224c56-fdb3-753c-44a3-250f87d37f06 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897089721400, cur 1549664102 expire 1549663952 last 1549663875 [146805.205946] Lustre: Skipped 11 previous similar messages [150441.285415] Lustre: fir-MDT0000: haven't heard from client 5bfe4986-235b-4715-c985-36823d3f937e (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896415a61c00, cur 1549667738 expire 1549667588 last 1549667511 [150441.307201] Lustre: Skipped 5 previous similar messages [150628.887151] Lustre: MGS: Connection restored to 4f488fb9-6b99-0046-82a0-51dc85b62ae9 (at 10.8.11.22@o2ib6) [150628.896897] Lustre: Skipped 11 previous similar messages [151084.291422] Lustre: fir-MDT0000: haven't heard from client def3c5c8-3e35-1d99-db1f-2e603078f6cb (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897076ed3400, cur 1549668381 expire 1549668231 last 1549668154 [151084.313215] Lustre: Skipped 2 previous similar messages [151265.912199] Lustre: MGS: Connection restored to 4f488fb9-6b99-0046-82a0-51dc85b62ae9 (at 10.8.11.22@o2ib6) [151265.921948] Lustre: Skipped 2 previous similar messages [151620.306092] Lustre: fir-MDT0000: haven't heard from client d451708f-fe0e-7a5b-e786-150c3376d848 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897081a9a400, cur 1549668917 expire 1549668767 last 1549668690 [151620.327905] Lustre: Skipped 2 previous similar messages [151692.936337] Lustre: MGS: Connection restored to b4cbb7e0-3c50-c595-5d00-119e4e161609 (at 10.9.105.45@o2ib4) [151692.946165] Lustre: Skipped 2 previous similar messages [151870.611134] Lustre: 57468:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549669116/real 1549669116] req@ffff8997f79ff800 x1624925701602816/t0(0) o104->fir-MDT0002@10.9.105.45@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549669167 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [151870.638561] Lustre: 57468:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [151921.649416] Lustre: 57468:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549669167/real 1549669167] req@ffff8997f79ff800 x1624925701602816/t0(0) o104->fir-MDT0002@10.9.105.45@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549669218 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [151972.678692] Lustre: 57468:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549669218/real 1549669218] req@ffff8997f79ff800 x1624925701602816/t0(0) o104->fir-MDT0002@10.9.105.45@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549669269 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [151972.706133] LustreError: 57468:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.105.45@o2ib4) failed to reply to blocking AST (req@ffff8997f79ff800 x1624925701602816 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff89745c6b3180/0x1a4b7ac394144404 lrc: 4/0,0 mode: PR/PR res: [0x2c00042b2:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.9.105.45@o2ib4 remote: 0xa567adbc8f1c91de expref: 31 pid: 56086 timeout: 152067 lvb_type: 0 [151972.748841] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.105.45@o2ib4 was evicted due to a lock blocking callback time out: rc -110 [151972.761565] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 153s: evicting client at 10.9.105.45@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff89745c6b3180/0x1a4b7ac394144404 lrc: 3/0,0 mode: PR/PR res: [0x2c00042b2:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.9.105.45@o2ib4 remote: 0xa567adbc8f1c91de expref: 32 pid: 56086 timeout: 0 lvb_type: 0 [152023.315484] Lustre: fir-MDT0000: haven't heard from client 0958683a-a0cd-a3d6-4d5f-4c0b9a2809a6 (at 10.9.105.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996a4f22800, cur 1549669320 expire 1549669170 last 1549669093 [152023.337365] Lustre: Skipped 5 previous similar messages [152034.853588] Lustre: MGS: Connection restored to b4cbb7e0-3c50-c595-5d00-119e4e161609 (at 10.9.105.45@o2ib4) [152034.863416] Lustre: Skipped 5 previous similar messages [152643.076426] Lustre: MGS: Connection restored to 4f488fb9-6b99-0046-82a0-51dc85b62ae9 (at 10.8.11.22@o2ib6) [152643.086166] Lustre: Skipped 5 previous similar messages [152947.338401] Lustre: fir-MDT0000: haven't heard from client 231a6b48-47a4-329d-19ac-99578db8c267 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975c5687800, cur 1549670244 expire 1549670094 last 1549670017 [152947.360236] Lustre: Skipped 7 previous similar messages [153392.227846] LustreError: 62220:0:(obd_config.c:1261:class_process_config()) this lcfg command requires a device name [153394.393176] LustreError: 62221:0:(obd_config.c:1261:class_process_config()) this lcfg command requires a device name [153398.470993] LustreError: 62222:0:(obd_config.c:1261:class_process_config()) this lcfg command requires a device name [153616.284278] Lustre: MGS: Connection restored to 4f488fb9-6b99-0046-82a0-51dc85b62ae9 (at 10.8.11.22@o2ib6) [153616.294020] Lustre: Skipped 5 previous similar messages [153944.363178] Lustre: fir-MDT0000: haven't heard from client ee23baab-77d3-3930-74e9-c93f730e142b (at 10.8.21.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982d6309000, cur 1549671241 expire 1549671091 last 1549671014 [153944.384971] Lustre: Skipped 5 previous similar messages [154220.203462] Lustre: MGS: Connection restored to 4f488fb9-6b99-0046-82a0-51dc85b62ae9 (at 10.8.11.22@o2ib6) [154220.213201] Lustre: Skipped 5 previous similar messages [154700.382344] Lustre: fir-MDT0000: haven't heard from client aaf017e1-98c3-f97d-d4ea-425a6657968e (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89874eb46400, cur 1549671997 expire 1549671847 last 1549671770 [154700.404152] Lustre: Skipped 5 previous similar messages [154869.581574] Lustre: MGS: Connection restored to 4f488fb9-6b99-0046-82a0-51dc85b62ae9 (at 10.8.11.22@o2ib6) [154869.591314] Lustre: Skipped 5 previous similar messages [155391.417054] Lustre: MGS: haven't heard from client 1c386af7-2312-b03b-6fdb-114c8b52223d (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996abbf6400, cur 1549672688 expire 1549672538 last 1549672461 [155391.438149] Lustre: Skipped 11 previous similar messages [155718.261255] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [155718.270996] Lustre: Skipped 14 previous similar messages [156312.422611] Lustre: fir-MDT0000: haven't heard from client 3c4bde33-5fdc-eb6c-c93e-9e0bd78c7e67 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897583ed4c00, cur 1549673609 expire 1549673459 last 1549673382 [156312.444410] Lustre: Skipped 8 previous similar messages [156372.005951] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [156372.015694] Lustre: Skipped 5 previous similar messages [157101.442770] Lustre: fir-MDT0002: haven't heard from client 06d1eec8-5894-312d-d593-1c1183dfa6da (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982fc23f000, cur 1549674398 expire 1549674248 last 1549674171 [157101.464560] Lustre: Skipped 8 previous similar messages [157154.230764] Lustre: MGS: Connection restored to b35c2b42-1604-5dbe-405e-b8c7a374087c (at 10.8.18.35@o2ib6) [157154.240509] Lustre: Skipped 8 previous similar messages [157717.457869] Lustre: fir-MDT0002: haven't heard from client 662a3d82-8849-9bb2-4128-cd5192814a19 (at 10.8.21.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8970f0276c00, cur 1549675014 expire 1549674864 last 1549674787 [157717.479676] Lustre: Skipped 5 previous similar messages [157786.213500] Lustre: MGS: Connection restored to 16664fb3-beb0-e49a-d4b6-262264fe49f5 (at 10.8.21.19@o2ib6) [157786.223242] Lustre: Skipped 5 previous similar messages [158590.480447] Lustre: fir-MDT0002: haven't heard from client 5f22821b-0472-716d-d871-bb9488893a1e (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895809fe0800, cur 1549675887 expire 1549675737 last 1549675660 [158590.502235] Lustre: Skipped 5 previous similar messages [158711.070130] Lustre: MGS: Connection restored to 4f488fb9-6b99-0046-82a0-51dc85b62ae9 (at 10.8.11.22@o2ib6) [158711.079867] Lustre: Skipped 5 previous similar messages [159575.505358] Lustre: fir-MDT0002: haven't heard from client b3d1419f-591a-7854-124c-81b9eccbde9b (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974707fa000, cur 1549676872 expire 1549676722 last 1549676645 [159575.527148] Lustre: Skipped 5 previous similar messages [159696.617950] Lustre: MGS: Connection restored to a9d6fa94-aa1c-f70a-7cae-a43d37e08037 (at 10.8.26.33@o2ib6) [159696.627691] Lustre: Skipped 5 previous similar messages [161182.546306] Lustre: fir-MDT0000: haven't heard from client eae463d3-9a7b-356d-af50-533d58b36c74 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89772131a000, cur 1549678479 expire 1549678329 last 1549678252 [161182.568103] Lustre: Skipped 5 previous similar messages [180154.785905] Lustre: Setting parameter *.osc.*.max_dirty_mb in log params [184068.120182] Lustre: fir-MDT0000: haven't heard from client fb5e4f4f-a645-fbec-7b80-08c8d8a2fea0 (at 10.8.1.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899709f98400, cur 1549701364 expire 1549701214 last 1549701137 [184068.141889] Lustre: Skipped 2 previous similar messages [214802.851596] Lustre: MGS: Connection restored to 70362e9e-2157-cc75-8b10-2f868df90fcb (at 10.8.15.7@o2ib6) [214802.861252] Lustre: Skipped 5 previous similar messages [214850.891575] Lustre: fir-MDT0002: haven't heard from client dcd406ad-ffdd-a7c9-489f-309957a1236e (at 10.8.15.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89967b652800, cur 1549732146 expire 1549731996 last 1549731919 [214850.913289] Lustre: Skipped 2 previous similar messages [214862.897512] Lustre: fir-MDT0000: haven't heard from client dcd406ad-ffdd-a7c9-489f-309957a1236e (at 10.8.15.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899633f33000, cur 1549732158 expire 1549732008 last 1549731931 [214862.919232] Lustre: Skipped 1 previous similar message [232270.328916] Lustre: fir-MDT0002: haven't heard from client ba56f8dc-0155-bbc1-f64d-9b90836aeb9c (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973bd2fcc00, cur 1549749565 expire 1549749415 last 1549749338 [232415.352622] Lustre: MGS: Connection restored to df217bff-deec-715e-3f90-260fc86b0960 (at 10.8.11.9@o2ib6) [232415.362277] Lustre: Skipped 2 previous similar messages [234431.384816] Lustre: fir-MDT0002: haven't heard from client 38a104b5-26ce-5d2d-596d-9304083f888f (at 10.9.112.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896190388800, cur 1549751726 expire 1549751576 last 1549751499 [234431.406698] Lustre: Skipped 2 previous similar messages [241128.551570] Lustre: fir-MDT0002: haven't heard from client 71256bfb-0197-3e32-60b3-6d6186c065c4 (at 10.9.105.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89643f712000, cur 1549758423 expire 1549758273 last 1549758196 [241128.573448] Lustre: Skipped 2 previous similar messages [241134.660043] Lustre: MGS: Connection restored to b4cbb7e0-3c50-c595-5d00-119e4e161609 (at 10.9.105.45@o2ib4) [241134.669879] Lustre: Skipped 2 previous similar messages [243377.607528] Lustre: fir-MDT0000: haven't heard from client dfad11f5-37c7-7014-c92a-508def843e75 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8965f2a0d800, cur 1549760672 expire 1549760522 last 1549760445 [243377.629343] Lustre: Skipped 2 previous similar messages [243429.214127] Lustre: MGS: Connection restored to a9d6fa94-aa1c-f70a-7cae-a43d37e08037 (at 10.8.26.33@o2ib6) [243429.223875] Lustre: Skipped 2 previous similar messages [244181.781604] LNet: Service thread pid 63241 was inactive for 200.71s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [244181.798632] Pid: 63241, comm: mdt00_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [244181.808458] Call Trace: [244181.811010] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [244181.818038] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [244181.825351] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [244181.832269] [] mdt_object_lock_internal+0x70/0x3e0 [mdt] [244181.839388] [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] [244181.846397] [] mdt_getattr_name+0xc4/0x2b0 [mdt] [244181.852806] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [244181.859853] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [244181.867659] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [244181.874093] [] kthread+0xd1/0xe0 [244181.879095] [] ret_from_fork_nospec_begin+0xe/0x21 [244181.885672] [] 0xffffffffffffffff [244181.890795] LustreError: dumping log to /tmp/lustre-log.1549761476.63241 [244183.273183] LNet: Service thread pid 54657 was inactive for 202.20s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [244183.290208] Pid: 54657, comm: mdt03_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [244183.300039] Call Trace: [244183.302595] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [244183.309647] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [244183.316933] [] mdt_dom_discard_data+0x101/0x130 [mdt] [244183.323786] [] mdt_reint_unlink+0x331/0x14a0 [mdt] [244183.330359] [] mdt_reint_rec+0x83/0x210 [mdt] [244183.336511] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [244183.343183] [] mdt_reint+0x67/0x140 [mdt] [244183.348988] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [244183.356022] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [244183.363852] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [244183.370270] [] kthread+0xd1/0xe0 [244183.375284] [] ret_from_fork_nospec_begin+0xe/0x21 [244183.381848] [] 0xffffffffffffffff [244281.068098] LustreError: 63241:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1549761275, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8984cf75f2c0/0x1a4b7ac433687a77 lrc: 3/1,0 mode: --/PR res: [0x2c0003bcd:0xd9c:0x0].0x0 bits 0x12/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 63241 timeout: 0 lvb_type: 0 [244281.068147] LustreError: dumping log to /tmp/lustre-log.1549761575.54657 [244281.114355] LustreError: 63241:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message [244575.995503] Lustre: 54706:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8997bc2db600 x1624836094636288/t30146813153(0) o36->bef16258-699d-0e14-bdeb-b454fac00d89@10.9.112.15@o2ib4:555/0 lens 504/2888 e 24 to 0 dl 1549761875 ref 2 fl Interpret:/0/0 rc 0/0 [244583.121776] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [244583.131597] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [244583.131600] Lustre: Skipped 2 previous similar messages [244583.147726] Lustre: Skipped 1 previous similar message [244611.638450] Lustre: fir-MDT0002: haven't heard from client 36b574bd-6f14-9f2c-6c78-1e23c5b6197c (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896211795800, cur 1549761906 expire 1549761756 last 1549761679 [244611.660259] Lustre: Skipped 2 previous similar messages [244614.903701] LustreError: 55219:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c0001671:0x422:0x0] [244624.317605] LustreError: 55224:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c000175d:0xa8:0x0] [244656.607130] LustreError: 55229:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c0003376:0x6f6:0x0] [244692.719209] LustreError: 55319:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c00033ab:0x1b1:0x0] [244805.455423] Lustre: MGS: Connection restored to a9d6fa94-aa1c-f70a-7cae-a43d37e08037 (at 10.8.26.33@o2ib6) [244805.465172] Lustre: Skipped 1 previous similar message [245031.656254] Lustre: MGS: haven't heard from client 26f09324-35e0-3de4-b8c0-7058e8a8d58c (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89832f291400, cur 1549762326 expire 1549762176 last 1549762099 [245031.677464] Lustre: Skipped 2 previous similar messages [245034.649054] Lustre: fir-MDT0002: haven't heard from client bdaf51fb-edf4-e82f-51be-9141ee573c83 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896cd87f5c00, cur 1549762329 expire 1549762179 last 1549762102 [245184.207550] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [245184.217664] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [245184.228040] Lustre: Skipped 2 previous similar messages [245219.938436] LustreError: 55378:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c000338c:0xd40:0x0] [245321.659867] LustreError: 55186:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c000167c:0x187f:0x0] [245408.552961] Lustre: MGS: Connection restored to a9d6fa94-aa1c-f70a-7cae-a43d37e08037 (at 10.8.26.33@o2ib6) [245664.670032] LustreError: 55267:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c00033a4:0x4a4c:0x0] [245758.267844] Lustre: MGS: Connection restored to 5c3f56c6-80cd-1194-6868-c8f2838246fa (at 10.8.15.5@o2ib6) [245758.277507] Lustre: Skipped 2 previous similar messages [245785.289802] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [245785.300061] Lustre: Skipped 1 previous similar message [245785.303603] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [245785.303606] Lustre: Skipped 2 previous similar messages [245810.668486] Lustre: fir-MDT0002: haven't heard from client 1d2da3ac-e307-036e-f8a3-99f8b3ab4ed7 (at 10.8.15.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8967558f5400, cur 1549763105 expire 1549762955 last 1549762878 [245810.690217] Lustre: Skipped 1 previous similar message [246003.261475] LustreError: 55381:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c0003376:0x26ad:0x0] [246386.374597] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [246386.384707] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [246386.395058] Lustre: Skipped 2 previous similar messages [246415.683715] Lustre: fir-MDT0002: haven't heard from client e44bc79e-ea6c-0264-2357-8d89f812546a (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8962a8bff800, cur 1549763710 expire 1549763560 last 1549763483 [246415.705504] Lustre: Skipped 2 previous similar messages [246987.462946] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [246987.470874] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [246987.483541] Lustre: Skipped 2 previous similar messages [247465.710197] Lustre: fir-MDT0002: haven't heard from client 480e7060-8c53-d341-81f4-5a68a9d6f5d4 (at 10.8.15.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89632dfc3000, cur 1549764760 expire 1549764610 last 1549764533 [247465.731903] Lustre: Skipped 2 previous similar messages [247588.542127] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [247588.552237] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [247588.562590] Lustre: Skipped 2 previous similar messages [248189.632222] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [248189.638337] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [248189.652848] Lustre: Skipped 2 previous similar messages [248790.708966] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [248790.719083] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [248790.729451] Lustre: Skipped 2 previous similar messages [249391.799667] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [249391.804273] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [249391.820278] Lustre: Skipped 2 previous similar messages [249992.874633] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [249992.884744] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [249992.895108] Lustre: Skipped 1 previous similar message [250593.956321] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [250593.966587] Lustre: Skipped 1 previous similar message [250593.969963] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [250722.791791] Lustre: fir-MDT0002: haven't heard from client ca1af5b2-4b74-b03d-4a2b-13a823b2dc8f (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961f97b3800, cur 1549768017 expire 1549767867 last 1549767790 [250722.813600] Lustre: Skipped 2 previous similar messages [251195.040297] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [251195.050428] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [251195.060787] Lustre: Skipped 2 previous similar messages [251796.122975] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [251796.133235] Lustre: Skipped 1 previous similar message [251796.136616] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [252397.207152] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [252397.217258] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [252397.227611] Lustre: Skipped 2 previous similar messages [252998.280358] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [252998.290617] Lustre: Skipped 1 previous similar message [252998.295892] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [252998.306432] Lustre: Skipped 1 previous similar message [253599.375080] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [253599.382617] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [253599.395706] Lustre: Skipped 2 previous similar messages [254200.453493] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [254200.463778] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [254200.474317] Lustre: Skipped 2 previous similar messages [254801.543035] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [254801.550391] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [254801.563652] Lustre: Skipped 2 previous similar messages [255402.620933] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [255402.631221] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [255402.641753] Lustre: Skipped 2 previous similar messages [256003.710314] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [256003.717635] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [256003.730942] Lustre: Skipped 2 previous similar messages [256604.790111] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [256604.800443] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [256604.811085] Lustre: Skipped 2 previous similar messages [257205.877622] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [257205.886861] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [257205.898225] Lustre: Skipped 2 previous similar messages [257806.957659] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [257806.967948] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [257806.978473] Lustre: Skipped 2 previous similar messages [258408.045323] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [258408.054715] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [258408.065936] Lustre: Skipped 2 previous similar messages [259009.125332] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [259009.135616] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [259009.146141] Lustre: Skipped 2 previous similar messages [259610.202896] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [259610.212985] Lustre: Skipped 1 previous similar message [259610.218239] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [259610.228599] Lustre: Skipped 1 previous similar message [260211.292441] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [260211.302701] Lustre: Skipped 1 previous similar message [260211.304438] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [260812.374889] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [260812.384999] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [260812.395344] Lustre: Skipped 2 previous similar messages [261413.459217] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [261413.469480] Lustre: Skipped 1 previous similar message [261413.471373] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [262014.541771] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [262014.551878] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [262014.562226] Lustre: Skipped 2 previous similar messages [262615.626219] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [262615.636488] Lustre: Skipped 1 previous similar message [262615.638222] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [263216.708787] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [263216.718896] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [263216.729242] Lustre: Skipped 2 previous similar messages [263817.783610] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [263817.793878] Lustre: Skipped 1 previous similar message [263817.799139] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [263817.809691] Lustre: Skipped 1 previous similar message [264418.876605] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [264418.885579] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [264418.897231] Lustre: Skipped 2 previous similar messages [264489.137273] Lustre: fir-MDT0000: haven't heard from client e69a58ac-0a54-448a-34ab-47e71ec425db (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89983fcf3400, cur 1549781783 expire 1549781633 last 1549781556 [264489.159084] Lustre: Skipped 2 previous similar messages [265019.946484] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [265019.956768] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [265019.967287] Lustre: Skipped 4 previous similar messages [265621.043484] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [265621.044744] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [265621.044747] Lustre: Skipped 1 previous similar message [265621.069317] Lustre: Skipped 2 previous similar messages [266222.115835] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [266222.125949] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [266222.136293] Lustre: Skipped 1 previous similar message [266465.464802] Lustre: 63794:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549783752/real 1549783752] req@ffff898335b10900 x1624926412406864/t0(0) o104->fir-MDT0002@10.8.17.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549783759 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [266472.491976] Lustre: 63794:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549783759/real 1549783759] req@ffff898335b10900 x1624926412406864/t0(0) o104->fir-MDT0002@10.8.17.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549783766 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [266486.519329] Lustre: 63794:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549783773/real 1549783773] req@ffff898335b10900 x1624926412406864/t0(0) o104->fir-MDT0002@10.8.17.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549783780 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [266486.546670] Lustre: 63794:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [266507.556860] Lustre: 63794:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549783794/real 1549783794] req@ffff898335b10900 x1624926412406864/t0(0) o104->fir-MDT0002@10.8.17.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549783801 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [266507.584207] Lustre: 63794:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [266542.594738] Lustre: 63794:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549783829/real 1549783829] req@ffff898335b10900 x1624926412406864/t0(0) o104->fir-MDT0002@10.8.17.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549783836 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [266542.622087] Lustre: 63794:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [266558.189415] Lustre: fir-MDT0002: haven't heard from client 994c77ca-5a3e-accc-3ddc-a08f18403cd1 (at 10.8.17.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995cc2a4000, cur 1549783852 expire 1549783702 last 1549783625 [266558.211221] Lustre: Skipped 2 previous similar messages [266568.194884] Lustre: fir-MDT0000: haven't heard from client 994c77ca-5a3e-accc-3ddc-a08f18403cd1 (at 10.8.17.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899633f30400, cur 1549783862 expire 1549783712 last 1549783635 [266823.212761] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [266823.216239] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [266823.216241] Lustre: Skipped 1 previous similar message [266823.238606] Lustre: Skipped 2 previous similar messages [267424.287171] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [267424.297453] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [267424.307979] Lustre: Skipped 1 previous similar message [268025.384390] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [268025.385708] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [268025.385710] Lustre: Skipped 1 previous similar message [268025.410234] Lustre: Skipped 2 previous similar messages [268626.456780] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [268626.466892] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [268626.477237] Lustre: Skipped 1 previous similar message [269227.553733] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [269227.557119] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [269227.557121] Lustre: Skipped 1 previous similar message [269227.579559] Lustre: Skipped 2 previous similar messages [269755.269870] Lustre: MGS: haven't heard from client 3e506739-c1da-6849-9d95-5f18052df34b (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89747168c400, cur 1549787049 expire 1549786899 last 1549786822 [269755.290879] Lustre: Skipped 1 previous similar message [269766.276648] Lustre: fir-MDT0002: haven't heard from client 0cda1ca9-b849-4b59-7ce7-48abe2de3c2e (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8966d37ec800, cur 1549787060 expire 1549786910 last 1549786833 [269766.298375] Lustre: Skipped 1 previous similar message [269828.628115] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [269828.638398] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [269828.648939] Lustre: Skipped 1 previous similar message [270429.724978] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [270429.727520] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [270429.727522] Lustre: Skipped 1 previous similar message [270429.750864] Lustre: Skipped 2 previous similar messages [271030.798331] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [271030.808446] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [271030.818802] Lustre: Skipped 1 previous similar message [271631.887895] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [271631.895238] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [271631.895240] Lustre: Skipped 1 previous similar message [271631.913729] Lustre: Skipped 2 previous similar messages [272232.966102] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [272232.976211] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [272232.986556] Lustre: Skipped 1 previous similar message [272834.060820] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [272834.063019] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [272834.081426] Lustre: Skipped 2 previous similar messages [273435.133850] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [273435.143959] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [273435.154305] Lustre: Skipped 1 previous similar message [274036.218539] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [274036.228803] Lustre: Skipped 1 previous similar message [274036.230778] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [274637.301601] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [274637.311717] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [274637.322072] Lustre: Skipped 2 previous similar messages [275238.376426] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [275238.386690] Lustre: Skipped 1 previous similar message [275238.391952] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [275238.402486] Lustre: Skipped 1 previous similar message [275839.469385] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [275839.478398] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [275839.490009] Lustre: Skipped 2 previous similar messages [276440.549060] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [276440.559344] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [276440.569878] Lustre: Skipped 2 previous similar messages [277041.637162] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [277041.646229] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [277041.657790] Lustre: Skipped 2 previous similar messages [277642.716875] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [277642.727156] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [277642.737696] Lustre: Skipped 2 previous similar messages [278243.805853] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [278243.813888] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [278243.826470] Lustre: Skipped 2 previous similar messages [278844.885540] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [278844.895823] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [278844.906341] Lustre: Skipped 2 previous similar messages [279445.962482] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [279445.972573] Lustre: Skipped 1 previous similar message [279445.977834] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [279445.988215] Lustre: Skipped 1 previous similar message [280047.043123] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [280047.053386] Lustre: Skipped 1 previous similar message [280047.058646] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [280047.069162] Lustre: Skipped 1 previous similar message [280648.134722] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [280648.144823] Lustre: Skipped 1 previous similar message [280648.144868] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [281249.215586] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [281249.225870] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [281249.236413] Lustre: Skipped 2 previous similar messages [281850.301936] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [281850.312025] Lustre: Skipped 1 previous similar message [281850.312212] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [282451.382583] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [282451.392867] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [282451.403388] Lustre: Skipped 2 previous similar messages [283052.469175] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [283052.479261] Lustre: Skipped 1 previous similar message [283052.479408] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [283653.549999] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [283653.560281] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [283653.570802] Lustre: Skipped 2 previous similar messages [284254.636541] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [284254.646648] Lustre: Skipped 1 previous similar message [284254.646770] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [284855.707518] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [284855.717801] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [284855.728323] Lustre: Skipped 1 previous similar message [285456.803902] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [285456.804017] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [285456.824533] Lustre: Skipped 2 previous similar messages [286057.864636] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [286057.874923] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [286057.885450] Lustre: Skipped 1 previous similar message [286658.961339] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [286658.971479] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [286658.971482] Lustre: Skipped 1 previous similar message [286658.987177] Lustre: Skipped 2 previous similar messages [287260.042048] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [287260.052168] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [287260.062516] Lustre: Skipped 1 previous similar message [287861.133709] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [287861.139492] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [287861.154316] Lustre: Skipped 2 previous similar messages [288462.210436] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [288462.220584] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [288462.230987] Lustre: Skipped 2 previous similar messages [289063.291205] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [289063.301463] Lustre: Skipped 1 previous similar message [289063.306722] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [289063.317276] Lustre: Skipped 1 previous similar message [289664.377682] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [289664.387772] Lustre: Skipped 1 previous similar message [289664.392776] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [290155.789100] Lustre: fir-MDT0000: haven't heard from client 5bd726e6-42ac-2c8a-e06c-c3087c474126 (at 10.8.21.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896e857c8c00, cur 1549807449 expire 1549807299 last 1549807222 [290265.462940] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [290265.473232] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [290265.483761] Lustre: Skipped 5 previous similar messages [290866.544402] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [290866.554493] Lustre: Skipped 1 previous similar message [290866.559209] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [291467.619512] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [291467.629791] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [291467.640315] Lustre: Skipped 1 previous similar message [291541.818022] Lustre: fir-MDT0002: haven't heard from client 572fee18-597b-f5ad-f93d-9178ef57a0e3 (at 10.8.21.19@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8970f2b69000, cur 1549808835 expire 1549808685 last 1549808608 [291541.839847] Lustre: Skipped 2 previous similar messages [292068.711687] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [292068.715695] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [292068.715697] Lustre: Skipped 3 previous similar messages [292068.737606] Lustre: Skipped 2 previous similar messages [292669.786273] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [292669.796562] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [292669.807089] Lustre: Skipped 1 previous similar message [293270.883078] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [293270.884177] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [293270.884179] Lustre: Skipped 1 previous similar message [293270.908903] Lustre: Skipped 2 previous similar messages [293871.955123] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [293871.965233] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [293871.975579] Lustre: Skipped 1 previous similar message [294473.052112] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [294473.055910] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [294473.055913] Lustre: Skipped 1 previous similar message [294473.077964] Lustre: Skipped 2 previous similar messages [295074.116567] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [295074.126847] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [295074.137364] Lustre: Skipped 1 previous similar message [295675.213311] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [295675.223574] Lustre: Skipped 1 previous similar message [295675.224699] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [295675.224702] Lustre: Skipped 1 previous similar message [296276.295165] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [296276.305280] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [296276.315633] Lustre: Skipped 2 previous similar messages [296877.385153] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [296877.391840] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [296877.405751] Lustre: Skipped 2 previous similar messages [297478.461878] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [297478.471988] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [297478.482337] Lustre: Skipped 2 previous similar messages [298079.541852] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [298079.552119] Lustre: Skipped 1 previous similar message [298079.557376] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [298079.567908] Lustre: Skipped 1 previous similar message [298680.628841] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [298680.638929] Lustre: Skipped 1 previous similar message [298680.643676] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [299281.704667] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [299281.714948] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [299281.725484] Lustre: Skipped 1 previous similar message [299851.132079] Lustre: MGS: Received new LWP connection from 10.8.14.7@o2ib6, removing former export from same NID [299866.576853] Lustre: MGS: Received new LWP connection from 10.8.14.7@o2ib6, removing former export from same NID [299882.797436] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [299882.802078] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [299882.802080] Lustre: Skipped 3 previous similar messages [299882.823374] Lustre: Skipped 3 previous similar messages [299891.700521] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.14.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [299891.717802] LustreError: Skipped 15 previous similar messages [300085.624395] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.14.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [300085.641676] LustreError: Skipped 1 previous similar message [300110.802247] Lustre: MGS: Received new LWP connection from 10.8.14.7@o2ib6, removing former export from same NID [300348.042069] Lustre: MGS: haven't heard from client d5e8c8b2-1c81-9ffd-59f7-e9a28b612742 (at 10.8.9.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961b864dc00, cur 1549817641 expire 1549817491 last 1549817414 [300348.062993] Lustre: Skipped 2 previous similar messages [300483.863621] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [300483.873887] Lustre: Skipped 4 previous similar messages [300483.879245] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [300483.889780] Lustre: Skipped 9 previous similar messages [300640.051652] Lustre: fir-MDT0002: haven't heard from client d0eadcc9-2946-b94d-0b9f-8d123871ad52 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896760ae3800, cur 1549817933 expire 1549817783 last 1549817706 [300640.073450] Lustre: Skipped 2 previous similar messages [300749.789723] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.14.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [300774.613100] Lustre: MGS: Received new LWP connection from 10.8.14.7@o2ib6, removing former export from same NID [301031.458208] Lustre: MGS: Received new LWP connection from 10.8.14.7@o2ib6, removing former export from same NID [301084.966668] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [301084.972356] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [301084.972358] Lustre: Skipped 10 previous similar messages [301084.992794] Lustre: Skipped 6 previous similar messages [301547.713584] Lustre: MGS: Received new LWP connection from 10.8.14.7@o2ib6, removing former export from same NID [301686.043603] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [301686.053712] Lustre: Skipped 2 previous similar messages [301686.059060] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [301686.069448] Lustre: Skipped 5 previous similar messages [302280.819740] Lustre: MGS: Received new LWP connection from 10.8.14.7@o2ib6, removing former export from same NID [302287.140011] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [302287.145832] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [302287.145834] Lustre: Skipped 3 previous similar messages [302287.165934] Lustre: Skipped 4 previous similar messages [302888.216549] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [302888.226658] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [302888.237025] Lustre: Skipped 1 previous similar message [303489.312883] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [303489.313395] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [303489.333494] Lustre: Skipped 2 previous similar messages [303692.481982] Lustre: 56079:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549820978/real 1549820978] req@ffff8996f730dd00 x1624926607062720/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549820985 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [303692.509337] Lustre: 56079:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [303706.519340] Lustre: 56079:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549820992/real 1549820992] req@ffff8996f730dd00 x1624926607062720/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549820999 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [303706.546675] Lustre: 56079:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [303727.556863] Lustre: 56079:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549821013/real 1549821013] req@ffff8996f730dd00 x1624926607062720/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549821020 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [303727.584201] Lustre: 56079:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [303761.877724] Lustre: 56099:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549821047/real 1549821047] req@ffff896be5254e00 x1624926607303312/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549821054 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [303761.905079] Lustre: 56099:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages [303790.596463] LustreError: 56079:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.114.5@o2ib4) failed to reply to blocking AST (req@ffff8996f730dd00 x1624926607062720 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff89857ae5a880/0x1a4b7ac48c3efa08 lrc: 4/0,0 mode: PR/PR res: [0x2c0001745:0x587d:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.114.5@o2ib4 remote: 0xe82a9cee2c6c6917 expref: 3329 pid: 57414 timeout: 303875 lvb_type: 0 [303790.639408] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.114.5@o2ib4 was evicted due to a lock blocking callback time out: rc -110 [303790.652028] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.9.114.5@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff89857ae5a880/0x1a4b7ac48c3efa08 lrc: 3/0,0 mode: PR/PR res: [0x2c0001745:0x587d:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.114.5@o2ib4 remote: 0xe82a9cee2c6c6917 expref: 3330 pid: 57414 timeout: 0 lvb_type: 0 [303857.126662] Lustre: fir-MDT0000: haven't heard from client f92f9622-3835-3057-15b3-90b2bfd416b2 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899752203400, cur 1549821150 expire 1549821000 last 1549820923 [303857.148455] Lustre: Skipped 2 previous similar messages [304090.372907] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [304090.383023] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [304090.393366] Lustre: Skipped 1 previous similar message [304173.133888] Lustre: fir-MDT0002: haven't heard from client 7cee7bf7-9aa1-cc50-5aed-b23b669bf632 (at 10.8.15.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896187fcc000, cur 1549821466 expire 1549821316 last 1549821239 [304173.155614] Lustre: Skipped 1 previous similar message [304691.469258] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [304691.469988] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [304691.469990] Lustre: Skipped 4 previous similar messages [304691.495197] Lustre: Skipped 2 previous similar messages [305292.530357] Lustre: fir-MDT0002: Client bef16258-699d-0e14-bdeb-b454fac00d89 (at 10.9.112.15@o2ib4) reconnecting [305292.540649] Lustre: fir-MDT0002: Connection restored to d3ee77a1-a15a-7c99-ed1b-9049b2fac11c (at 10.9.112.15@o2ib4) [305292.551172] Lustre: Skipped 1 previous similar message [305299.175825] Lustre: fir-MDT0002: haven't heard from client a2451203-1d0a-2877-c580-e76d5d55f570 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897633f11400, cur 1549822592 expire 1549822442 last 1549822365 [305299.197527] Lustre: Skipped 2 previous similar messages [305893.641840] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [305893.651931] Lustre: Skipped 1 previous similar message [305893.657187] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [305893.667548] Lustre: Skipped 4 previous similar messages [305944.179760] Lustre: fir-MDT0002: haven't heard from client bc33194c-64c7-c6da-7f2c-18f521e68033 (at 10.8.17.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89712da4e400, cur 1549823237 expire 1549823087 last 1549823010 [305944.201573] Lustre: Skipped 2 previous similar messages [306020.181496] Lustre: fir-MDT0000: haven't heard from client 5fa2be43-ef73-0106-b068-07493ef16b97 (at 10.8.24.24@o2ib6) in 193 seconds. I think it's dead, and I am evicting it. exp ffff899709f9a800, cur 1549823313 expire 1549823163 last 1549823120 [306020.203282] Lustre: Skipped 50 previous similar messages [306494.743410] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [306494.753516] Lustre: Skipped 2 previous similar messages [306494.758860] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [306494.769208] Lustre: Skipped 5 previous similar messages [306906.617353] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.14.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [306906.634639] LustreError: Skipped 2 previous similar messages [306928.202392] Lustre: fir-MDT0002: haven't heard from client 46ffc5c0-5221-592c-b4b8-0937c3c0dccb (at 10.8.14.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8964da789800, cur 1549824221 expire 1549824071 last 1549823994 [306928.224096] Lustre: Skipped 2 previous similar messages [307095.845119] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [307095.855225] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [307095.865589] Lustre: Skipped 9 previous similar messages [307462.216623] Lustre: fir-MDT0002: haven't heard from client c9c466de-2010-da89-de6a-267ff847464e (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896bdb364400, cur 1549824755 expire 1549824605 last 1549824528 [307462.238409] Lustre: Skipped 2 previous similar messages [307538.220537] Lustre: fir-MDT0002: haven't heard from client 313817d5-fec2-02c4-445d-e59ed224bf6e (at 10.8.18.35@o2ib6) in 208 seconds. I think it's dead, and I am evicting it. exp ffff896beef0d000, cur 1549824831 expire 1549824681 last 1549824623 [307538.242340] Lustre: Skipped 2 previous similar messages [307696.941842] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [307696.951954] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [307696.962303] Lustre: Skipped 18 previous similar messages [307767.226358] Lustre: fir-MDT0002: haven't heard from client 1b7eb91b-6d34-d2d9-adec-fec070392a7e (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8960752cf400, cur 1549825060 expire 1549824910 last 1549824833 [307767.248155] Lustre: Skipped 2 previous similar messages [308298.038665] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [308298.048773] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [308298.059120] Lustre: Skipped 45 previous similar messages [308429.240696] Lustre: fir-MDT0002: haven't heard from client e1d528fc-8915-cf70-51f1-7bdbdb7bb7a5 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896bdaec0000, cur 1549825722 expire 1549825572 last 1549825495 [308429.262488] Lustre: Skipped 2 previous similar messages [308765.252111] Lustre: fir-MDT0002: haven't heard from client debffa97-2639-9c31-35f1-ecb330d410d8 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8981767fec00, cur 1549826058 expire 1549825908 last 1549825831 [308765.273812] Lustre: Skipped 2 previous similar messages [308899.135641] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [308899.145753] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [308899.156098] Lustre: Skipped 9 previous similar messages [308923.252278] Lustre: fir-MDT0002: haven't heard from client 2612f70b-5df1-0f42-d224-a01075aea84e (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896bd877bc00, cur 1549826216 expire 1549826066 last 1549825989 [308923.274081] Lustre: Skipped 2 previous similar messages [309418.264821] Lustre: fir-MDT0002: haven't heard from client 06362adb-bce3-1f8e-474a-7bd885f0ac07 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896bd9e7a800, cur 1549826711 expire 1549826561 last 1549826484 [309418.286632] Lustre: Skipped 2 previous similar messages [309500.232248] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [309500.242369] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [309500.252716] Lustre: Skipped 6 previous similar messages [310022.317063] Lustre: fir-MDT0000: haven't heard from client b445b87c-752a-495e-1deb-b67668cf85a8 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997b4f03c00, cur 1549827315 expire 1549827165 last 1549827088 [310022.338856] Lustre: Skipped 5 previous similar messages [310033.296580] Lustre: fir-MDT0002: haven't heard from client b445b87c-752a-495e-1deb-b67668cf85a8 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898243e1a000, cur 1549827326 expire 1549827176 last 1549827099 [310098.283590] Lustre: fir-MDT0000: haven't heard from client 64936eed-9476-c2d9-bbf0-67ab20448baa (at 10.8.11.22@o2ib6) in 206 seconds. I think it's dead, and I am evicting it. exp ffff896bd6244c00, cur 1549827391 expire 1549827241 last 1549827185 [310098.305397] Lustre: Skipped 1 previous similar message [310101.328907] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [310101.339017] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [310101.349378] Lustre: Skipped 6 previous similar messages [310286.287395] Lustre: MGS: haven't heard from client cc5cf11b-626a-6e45-278d-0dda97c1bbb4 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996ea301c00, cur 1549827579 expire 1549827429 last 1549827352 [310286.308502] Lustre: Skipped 2 previous similar messages [310364.972427] Lustre: 54667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549827650/real 1549827650] req@ffff89606e6b0c00 x1624926656113152/t0(0) o104->fir-MDT0002@10.9.112.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549827657 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [310364.999869] Lustre: 54667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages [310379.009779] Lustre: 54667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549827664/real 1549827664] req@ffff89606e6b0c00 x1624926656113152/t0(0) o104->fir-MDT0002@10.9.112.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549827671 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [310379.037214] Lustre: 54667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [310400.047307] Lustre: 54667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549827685/real 1549827685] req@ffff89606e6b0c00 x1624926656113152/t0(0) o104->fir-MDT0002@10.9.112.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549827692 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [310400.074755] Lustre: 54667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [310435.085186] Lustre: 54667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549827720/real 1549827720] req@ffff89606e6b0c00 x1624926656113152/t0(0) o104->fir-MDT0002@10.9.112.15@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549827727 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [310435.112630] Lustre: 54667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [310463.122910] LustreError: 54667:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.112.15@o2ib4) failed to reply to blocking AST (req@ffff89606e6b0c00 x1624926656113152 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8970af3d9440/0x1a4b7ac4b65e46d8 lrc: 4/0,0 mode: PR/PR res: [0x2c0001745:0x5927:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.112.15@o2ib4 remote: 0x43b35d7c799d7108 expref: 3273 pid: 54718 timeout: 310547 lvb_type: 0 [310463.166030] LustreError: 138-a: fir-MDT0002: A client on nid 10.9.112.15@o2ib4 was evicted due to a lock blocking callback time out: rc -110 [310463.178738] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.9.112.15@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8970af3d9440/0x1a4b7ac4b65e46d8 lrc: 3/0,0 mode: PR/PR res: [0x2c0001745:0x5927:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.112.15@o2ib4 remote: 0x43b35d7c799d7108 expref: 3274 pid: 54718 timeout: 0 lvb_type: 0 [310537.309314] Lustre: MGS: haven't heard from client 2a71a25c-cd91-209e-a2a1-7dab8a8749ca (at 10.9.112.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8960742e5800, cur 1549827830 expire 1549827680 last 1549827603 [310537.330510] Lustre: Skipped 2 previous similar messages [310702.425366] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [310702.435477] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [310702.445823] Lustre: Skipped 9 previous similar messages [311303.521863] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [311303.531976] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [311727.323548] Lustre: fir-MDT0002: haven't heard from client b22e05ba-2b75-11fe-7710-708e418387f4 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898105667400, cur 1549829020 expire 1549828870 last 1549828793 [311727.345339] Lustre: Skipped 4 previous similar messages [311904.613372] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [311904.623486] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [311904.633832] Lustre: Skipped 3 previous similar messages [312170.341945] Lustre: fir-MDT0000: haven't heard from client 3a1a746d-b1bb-29d1-aafa-9c2f3f723916 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89606bf25c00, cur 1549829463 expire 1549829313 last 1549829236 [312170.363739] Lustre: Skipped 2 previous similar messages [312427.340490] Lustre: MGS: haven't heard from client e3f6d349-8074-0864-8caf-d70e8cfea891 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898211a4f000, cur 1549829720 expire 1549829570 last 1549829493 [312427.361589] Lustre: Skipped 2 previous similar messages [312505.709889] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [312505.719997] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [312505.730342] Lustre: Skipped 3 previous similar messages [312997.355249] Lustre: fir-MDT0000: haven't heard from client 9e138298-dd85-93a9-9508-40699f5ed24f (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982be36d000, cur 1549830290 expire 1549830140 last 1549830063 [312997.377038] Lustre: Skipped 2 previous similar messages [313106.806544] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [313106.816666] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [313106.827023] Lustre: Skipped 3 previous similar messages [313212.480024] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.14.7@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [313212.497309] LustreError: Skipped 1 previous similar message [313463.370774] Lustre: fir-MDT0000: haven't heard from client 46ffc5c0-5221-592c-b4b8-0937c3c0dccb (at 10.8.14.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896ce0ef6000, cur 1549830756 expire 1549830606 last 1549830529 [313463.392478] Lustre: Skipped 5 previous similar messages [313707.903127] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [313707.913234] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [313707.923583] Lustre: Skipped 5 previous similar messages [313849.382155] Lustre: fir-MDT0002: haven't heard from client 1d7ed545-667f-2ef8-6bba-6c20aaec9c9f (at 10.8.14.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8966d37ed000, cur 1549831142 expire 1549830992 last 1549830915 [313849.403930] Lustre: Skipped 4 previous similar messages [314309.000151] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [314309.010265] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [314309.020639] Lustre: Skipped 6 previous similar messages [314573.394182] Lustre: fir-MDT0002: haven't heard from client a733ad7e-6dc5-d9cf-2e54-457a75b969b8 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982956aa800, cur 1549831866 expire 1549831716 last 1549831639 [314573.415974] Lustre: Skipped 5 previous similar messages [314910.097087] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [314910.107196] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [314910.117577] Lustre: Skipped 6 previous similar messages [315504.418146] Lustre: fir-MDT0002: haven't heard from client 4145f9ae-ef96-c859-ef6a-37d022f944a1 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896bc4ecc000, cur 1549832797 expire 1549832647 last 1549832570 [315504.439940] Lustre: Skipped 8 previous similar messages [315511.193965] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [315511.204077] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [315511.214427] Lustre: Skipped 3 previous similar messages [316112.290821] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [316112.300927] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [316112.311277] Lustre: Skipped 3 previous similar messages [316397.440031] Lustre: fir-MDT0002: haven't heard from client c69ac8c6-44e2-66aa-8a62-befc0d012a83 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896bc32c6800, cur 1549833690 expire 1549833540 last 1549833463 [316397.461825] Lustre: Skipped 2 previous similar messages [316713.387577] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [316713.397688] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [316713.408080] Lustre: Skipped 3 previous similar messages [317314.484401] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [317314.494512] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [317314.504861] Lustre: Skipped 3 previous similar messages [317915.581346] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [317915.591454] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [318516.672281] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [318516.682392] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [319117.763163] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [319117.773273] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [319718.853927] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [319718.864046] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [319926.529367] Lustre: fir-MDT0002: haven't heard from client 94219986-9952-af97-7955-bdd5fbf68578 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896059a74400, cur 1549837219 expire 1549837069 last 1549836992 [319926.551161] Lustre: Skipped 5 previous similar messages [320319.944685] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [320319.954795] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [320319.965158] Lustre: Skipped 3 previous similar messages [320921.041612] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [320921.051724] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [321521.577257] Lustre: fir-MDT0002: haven't heard from client 80186a91-5622-f710-936e-60b13bf6ba2a (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89868c3a6400, cur 1549838814 expire 1549838664 last 1549838587 [321521.599045] Lustre: Skipped 2 previous similar messages [321522.132287] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [321522.142399] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [321977.580427] Lustre: fir-MDT0002: haven't heard from client 4794c7b4-7411-b975-ce48-240ce26e8779 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898518a4f400, cur 1549839270 expire 1549839120 last 1549839043 [321977.602219] Lustre: Skipped 5 previous similar messages [322053.585651] Lustre: fir-MDT0002: haven't heard from client 9033bad8-ed8f-0fba-ed34-b7eaabc0b49a (at 10.8.13.14@o2ib6) in 226 seconds. I think it's dead, and I am evicting it. exp ffff896baff3d000, cur 1549839346 expire 1549839196 last 1549839120 [322053.607447] Lustre: Skipped 2 previous similar messages [322123.224249] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [322123.234363] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [322123.244731] Lustre: Skipped 6 previous similar messages [322407.591157] Lustre: fir-MDT0002: haven't heard from client b2cdefaf-3191-800b-799e-7cdaa78de5a4 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896072a43800, cur 1549839700 expire 1549839550 last 1549839473 [322407.612961] Lustre: Skipped 2 previous similar messages [322724.321248] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [322724.331356] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [322724.341705] Lustre: Skipped 6 previous similar messages [322877.603277] Lustre: fir-MDT0002: haven't heard from client a2f56ef7-1248-d78f-14d6-d50e697363ff (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896f65214800, cur 1549840170 expire 1549840020 last 1549839943 [322877.625073] Lustre: Skipped 2 previous similar messages [323325.418339] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [323325.428446] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [323325.438792] Lustre: Skipped 3 previous similar messages [323340.619003] Lustre: fir-MDT0002: haven't heard from client e2fa953d-0d75-a01d-bb24-51144cf7f16e (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896bc1e2b800, cur 1549840633 expire 1549840483 last 1549840406 [323340.640791] Lustre: Skipped 2 previous similar messages [323689.622984] Lustre: fir-MDT0002: haven't heard from client 83785fe3-bd11-caf3-a67d-c2afc4418cf5 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982146a0c00, cur 1549840982 expire 1549840832 last 1549840755 [323689.644802] Lustre: Skipped 2 previous similar messages [323926.515369] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [323926.525481] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [323926.535826] Lustre: Skipped 6 previous similar messages [324527.612087] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [324527.622198] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [324527.632576] Lustre: Skipped 3 previous similar messages [325128.708999] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [325128.719106] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [325552.675737] Lustre: fir-MDT0002: haven't heard from client a5dc9d81-fe89-1d0c-ba56-8e5850561c3c (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896baaac4c00, cur 1549842845 expire 1549842695 last 1549842618 [325552.697548] Lustre: Skipped 5 previous similar messages [325729.799686] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [325729.809799] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [325729.820164] Lustre: Skipped 3 previous similar messages [326330.896467] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [326330.906582] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [326899.704014] Lustre: fir-MDT0002: haven't heard from client e18eed75-ce52-cc42-be69-772ded053e90 (at 10.8.13.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8966d37ee800, cur 1549844192 expire 1549844042 last 1549843965 [326899.725806] Lustre: Skipped 2 previous similar messages [326931.988042] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [326931.998159] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [327533.079703] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [327533.089814] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [328134.170346] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [328134.180460] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [328651.747322] Lustre: fir-MDT0002: haven't heard from client 5d54f6bb-ec60-93ce-23fe-959bd0c0b8ca (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896ba5af6800, cur 1549845944 expire 1549845794 last 1549845717 [328651.769121] Lustre: Skipped 2 previous similar messages [328735.260980] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [328735.271086] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [329336.352661] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [329336.362773] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [329336.373116] Lustre: Skipped 3 previous similar messages [329781.776855] Lustre: fir-MDT0002: haven't heard from client e40a0243-3183-a092-90fd-52eaa9dab2c5 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898485739000, cur 1549847074 expire 1549846924 last 1549846847 [329781.798649] Lustre: Skipped 2 previous similar messages [329826.037451] LustreError: 56386:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.13.23@o2ib6 arrived at 1549847118 with bad export cookie 1894743047037871528 [329937.449485] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [329937.459593] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [329937.469940] Lustre: Skipped 4 previous similar messages [330052.782611] Lustre: fir-MDT0002: haven't heard from client e18eed75-ce52-cc42-be69-772ded053e90 (at 10.8.13.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8960403b5800, cur 1549847345 expire 1549847195 last 1549847118 [330052.804396] Lustre: Skipped 2 previous similar messages [330276.788763] Lustre: fir-MDT0002: haven't heard from client f3c028c9-a3d9-d5a7-ddec-09f86ecca0cf (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984d4b80800, cur 1549847569 expire 1549847419 last 1549847342 [330485.031502] LustreError: 56386:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.13.23@o2ib6 arrived at 1549847777 with bad export cookie 1894743047037884086 [330538.546676] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [330538.556784] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [330538.567149] Lustre: Skipped 4 previous similar messages [330566.796366] Lustre: fir-MDT0002: haven't heard from client 3ab5bcaa-8d5e-27d9-5913-f9d8f76ca855 (at 10.8.11.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995cc2a4800, cur 1549847859 expire 1549847709 last 1549847632 [330566.818156] Lustre: Skipped 2 previous similar messages [330642.802952] Lustre: fir-MDT0000: haven't heard from client e18eed75-ce52-cc42-be69-772ded053e90 (at 10.8.13.23@o2ib6) in 158 seconds. I think it's dead, and I am evicting it. exp ffff8983dd79c000, cur 1549847935 expire 1549847785 last 1549847777 [330642.824769] Lustre: Skipped 2 previous similar messages [330803.804091] Lustre: fir-MDT0002: haven't heard from client 8b55087e-0f17-47d4-05be-3ba21cb68a0a (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896b93f6bc00, cur 1549848096 expire 1549847946 last 1549847869 [331139.643951] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [331139.654075] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [331139.664439] Lustre: Skipped 6 previous similar messages [331291.814002] Lustre: fir-MDT0002: haven't heard from client df4b39e0-88f5-81ca-04a5-b693fd6c93da (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896df7bd0400, cur 1549848584 expire 1549848434 last 1549848357 [331291.835802] Lustre: Skipped 2 previous similar messages [331689.832702] Lustre: fir-MDT0002: haven't heard from client 3cfceea2-4eed-f978-0bc3-2e396fc23654 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997cfa2e800, cur 1549848982 expire 1549848832 last 1549848755 [331689.854537] Lustre: Skipped 2 previous similar messages [331740.741292] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [331740.751421] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [331740.761771] Lustre: Skipped 6 previous similar messages [332161.843513] Lustre: fir-MDT0002: haven't heard from client 5b0f177a-8169-6c32-4835-7ca0ba838e11 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984ceec4c00, cur 1549849454 expire 1549849304 last 1549849227 [332161.865307] Lustre: Skipped 2 previous similar messages [332341.838785] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [332341.848904] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [332341.859264] Lustre: Skipped 6 previous similar messages [332942.936335] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [332942.946449] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [333544.027562] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [333544.037675] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [333674.873498] Lustre: fir-MDT0002: haven't heard from client bd6b0907-bbf0-754e-ba62-411999a5fe50 (at 10.8.15.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961f97b6c00, cur 1549850967 expire 1549850817 last 1549850740 [333674.895211] Lustre: Skipped 2 previous similar messages [334145.118551] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [334145.128660] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [334746.209299] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [334746.219408] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [335347.300014] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [335347.310165] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [335948.390668] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [335948.400781] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [336549.481246] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [336549.491368] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [337150.571843] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [337150.581951] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [337751.662376] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [337751.672494] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [338352.752927] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [338352.763039] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [338953.844357] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [338953.854469] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [339321.018240] Lustre: fir-MDT0002: haven't heard from client 8521bd31-1a2d-c432-8053-5d8be1fc3492 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898440b7b000, cur 1549856613 expire 1549856463 last 1549856386 [339321.039859] Lustre: Skipped 2 previous similar messages [339328.019463] Lustre: MGS: haven't heard from client 8e045e6e-233a-35de-a2a4-f8e2840176c3 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896bd96f2c00, cur 1549856620 expire 1549856470 last 1549856393 [339554.935660] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [339554.945771] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [339554.956130] Lustre: Skipped 3 previous similar messages [339820.030533] Lustre: fir-MDT0002: haven't heard from client e18eed75-ce52-cc42-be69-772ded053e90 (at 10.8.13.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997cbbafc00, cur 1549857112 expire 1549856962 last 1549856885 [339820.052342] Lustre: Skipped 1 previous similar message [340156.032891] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [340156.043010] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [340156.053370] Lustre: Skipped 3 previous similar messages [340389.045341] Lustre: fir-MDT0002: haven't heard from client e18eed75-ce52-cc42-be69-772ded053e90 (at 10.8.13.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896011326000, cur 1549857681 expire 1549857531 last 1549857454 [340389.067147] Lustre: Skipped 2 previous similar messages [340757.130309] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [340757.140418] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [340757.150777] Lustre: Skipped 3 previous similar messages [341609.122464] LNet: Service thread pid 54734 was inactive for 200.45s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [341609.139486] Pid: 54734, comm: mdt00_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [341609.149314] Call Trace: [341609.151870] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [341609.158906] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [341609.166202] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [341609.173132] [] mdt_object_lock_internal+0x70/0x3e0 [mdt] [341609.180215] [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] [341609.187224] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [341609.193875] [] mdt_intent_policy+0x2e8/0xd00 [mdt] [341609.200453] [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [341609.207293] [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [341609.214485] [] tgt_enqueue+0x62/0x210 [ptlrpc] [341609.220774] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [341609.227800] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [341609.235598] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [341609.242030] [] kthread+0xd1/0xe0 [341609.247030] [] ret_from_fork_nospec_begin+0xe/0x21 [341609.253600] [] 0xffffffffffffffff [341609.258720] LustreError: dumping log to /tmp/lustre-log.1549858901.54734 [341708.668969] LustreError: 54734:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1549858700, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff89607d727980/0x1a4b7ac4e6f26b0b lrc: 3/1,0 mode: --/PR res: [0x2c0003bcd:0xd9c:0x0].0x0 bits 0x12/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 54734 timeout: 0 lvb_type: 0 [342003.302367] Lustre: 56074:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8957eca70f00 x1623970116998192/t0(0) o101->0ccdc4e2-9749-c9a5-afb4-85874ce74d6c@10.0.10.3@o2ib7:585/0 lens 576/3264 e 24 to 0 dl 1549859300 ref 2 fl Interpret:/0/0 rc 0/0 [342003.331374] Lustre: 56074:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message [342009.725019] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [342009.735152] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [342610.816013] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [342610.826127] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [342809.828463] list passed to list_sort() too long for efficiency [343211.906979] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [343211.917089] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [343812.997733] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [343813.007856] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [344414.088667] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [344414.098785] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [345015.179631] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [345015.189742] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [345616.270498] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [345616.280632] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [346217.361361] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [346217.371471] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [346818.452296] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [346818.462413] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [347419.542958] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [347419.553096] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [348020.633649] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [348020.643762] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [348621.724217] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [348621.734329] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [348635.189783] Lustre: 54726:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549865919/real 1549865919] req@ffff89716fb93f00 x1624927038422400/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549865926 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [348635.217065] Lustre: 54726:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [348649.227136] Lustre: 54726:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549865933/real 1549865933] req@ffff89716fb93f00 x1624927038422400/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549865940 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [348649.254411] Lustre: 54726:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [348670.265667] Lustre: 54726:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549865955/real 1549865955] req@ffff89716fb93f00 x1624927038422400/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549865962 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [348670.292950] Lustre: 54726:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [348705.266543] Lustre: 57444:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549865990/real 1549865990] req@ffff8983fffe0f00 x1624927038422912/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549865997 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [348705.293823] Lustre: 57444:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages [348773.540898] LustreError: 56082:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.15.7@o2ib6) returned error from glimpse AST (req@ffff8975f3aa7b00 x1624927039742416 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff89759c7733c0/0x1a4b7ac4d4ab722d lrc: 4/0,0 mode: PW/PW res: [0x200001804:0xdb69:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.15.7@o2ib6 remote: 0x74cb7365634efe12 expref: 68 pid: 54672 timeout: 0 lvb_type: 0 [348773.540902] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.15.7@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 [348773.540904] LustreError: Skipped 1 previous similar message [348773.540926] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 176s: evicting client at 10.8.15.7@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8970afa6f500/0x1a4b7ac50bad8ed2 lrc: 4/0,0 mode: PW/PW res: [0x200003fd6:0x95:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.8.15.7@o2ib6 remote: 0x74cb7365634f2659 expref: 69 pid: 54693 timeout: 0 lvb_type: 0 [348773.638498] LustreError: 56082:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 2 previous similar messages [348828.253995] Lustre: fir-MDT0002: haven't heard from client 0d191f02-38d6-3951-c250-cb88a08f4b30 (at 10.8.15.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984d7682400, cur 1549866120 expire 1549865970 last 1549865893 [348828.275723] Lustre: Skipped 2 previous similar messages [349222.814797] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [349222.824937] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [349222.835287] Lustre: Skipped 3 previous similar messages [349823.911346] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [349823.921461] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [349941.282730] Lustre: fir-MDT0000: haven't heard from client 5b70eaeb-9b1d-7d91-4f54-a3b1ba65e969 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896d4ffb8c00, cur 1549867233 expire 1549867083 last 1549867006 [349941.304450] Lustre: Skipped 1 previous similar message [350425.001842] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [350425.011971] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [350425.022327] Lustre: Skipped 6 previous similar messages [350436.294081] Lustre: fir-MDT0000: haven't heard from client 20072381-6c4e-2e16-e597-e58d87d2d63f (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896ba8fe5800, cur 1549867728 expire 1549867578 last 1549867501 [350436.315893] Lustre: Skipped 5 previous similar messages [351011.308651] Lustre: fir-MDT0002: haven't heard from client 02370cb9-c120-e247-f0d2-7797a4f951b4 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896e8bb8b000, cur 1549868303 expire 1549868153 last 1549868076 [351011.330362] Lustre: Skipped 2 previous similar messages [351026.098346] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [351026.108454] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [351026.118819] Lustre: Skipped 3 previous similar messages [351277.318146] Lustre: fir-MDT0000: haven't heard from client 846fc7c1-9fac-9eea-ec19-a0c2ec37db81 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89678c357000, cur 1549868569 expire 1549868419 last 1549868342 [351277.339953] Lustre: Skipped 2 previous similar messages [351627.195017] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [351627.205127] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [351627.215490] Lustre: Skipped 6 previous similar messages [352032.518041] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549869317/real 1549869317] req@ffff8957cda51200 x1624927075694784/t0(0) o104->fir-MDT0002@10.8.11.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549869324 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [352032.545403] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 22 previous similar messages [352046.555395] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549869331/real 1549869331] req@ffff8957cda51200 x1624927075694784/t0(0) o104->fir-MDT0002@10.8.11.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549869338 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [352046.582745] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [352067.592919] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549869352/real 1549869352] req@ffff8957cda51200 x1624927075694784/t0(0) o104->fir-MDT0002@10.8.11.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549869359 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [352067.620274] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [352102.631797] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549869387/real 1549869387] req@ffff8957cda51200 x1624927075694784/t0(0) o104->fir-MDT0002@10.8.11.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549869394 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [352102.659160] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [352172.671559] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549869457/real 1549869457] req@ffff8957cda51200 x1624927075694784/t0(0) o104->fir-MDT0002@10.8.11.22@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549869464 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [352172.698916] Lustre: 54680:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [352179.708748] LustreError: 54680:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.11.22@o2ib6) failed to reply to blocking AST (req@ffff8957cda51200 x1624927075694784 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff896b73709f80/0x1a4b7ac5168363e0 lrc: 4/0,0 mode: PR/PR res: [0x2c0003beb:0x8009:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.8.11.22@o2ib6 remote: 0x406e0f546ac5cc86 expref: 93 pid: 54581 timeout: 352313 lvb_type: 0 [352179.751533] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.11.22@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [352179.764137] LustreError: Skipped 1 previous similar message [352179.769822] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.11.22@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff896b73709f80/0x1a4b7ac5168363e0 lrc: 3/0,0 mode: PR/PR res: [0x2c0003beb:0x8009:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.8.11.22@o2ib6 remote: 0x406e0f546ac5cc86 expref: 94 pid: 54581 timeout: 0 lvb_type: 0 [352179.807249] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message [352198.341677] Lustre: fir-MDT0000: haven't heard from client 14b8a5cc-c311-1ba9-3b8d-0cba9e5112d8 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8986ba3efc00, cur 1549869490 expire 1549869340 last 1549869263 [352198.363486] Lustre: Skipped 2 previous similar messages [352228.291738] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [352228.301852] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [352757.354646] Lustre: fir-MDT0000: haven't heard from client 4134d24b-e8b5-caf8-c4be-35abdb4083c3 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89636cfec000, cur 1549870049 expire 1549869899 last 1549869822 [352757.376438] Lustre: Skipped 1 previous similar message [352761.352620] Lustre: fir-MDT0002: haven't heard from client 4134d24b-e8b5-caf8-c4be-35abdb4083c3 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896533376800, cur 1549870053 expire 1549869903 last 1549869826 [352761.374431] Lustre: Skipped 1 previous similar message [352829.382938] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [352829.393046] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [352829.403396] Lustre: Skipped 3 previous similar messages [353430.480265] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [353430.490372] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [353430.500720] Lustre: Skipped 3 previous similar messages [354031.577472] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [354031.587600] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [354457.394923] Lustre: fir-MDT0002: haven't heard from client 047894be-2394-9241-fd94-a892fc926bf2 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897507bcf000, cur 1549871749 expire 1549871599 last 1549871522 [354632.669714] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [354632.679824] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [354632.690184] Lustre: Skipped 3 previous similar messages [355149.417595] Lustre: fir-MDT0000: haven't heard from client 9b9fd958-7168-f6e9-10dc-dbc17783e1de (at 10.8.15.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8957c82e5800, cur 1549872441 expire 1549872291 last 1549872214 [355149.439301] Lustre: Skipped 2 previous similar messages [355153.412392] Lustre: fir-MDT0002: haven't heard from client 9b9fd958-7168-f6e9-10dc-dbc17783e1de (at 10.8.15.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8987c6248c00, cur 1549872445 expire 1549872295 last 1549872218 [355153.434093] Lustre: Skipped 1 previous similar message [355233.767170] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [355233.777287] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [355834.859416] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [355834.869552] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [356233.439456] Lustre: fir-MDT0000: haven't heard from client 8e73ad7d-038e-2a91-afb3-265714da253a (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896fd6fd3000, cur 1549873525 expire 1549873375 last 1549873298 [356435.950374] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [356435.960483] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [357037.041027] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [357037.051143] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [357037.061512] Lustre: Skipped 3 previous similar messages [357265.466429] Lustre: fir-MDT0000: haven't heard from client aa69108d-2cd6-ac93-4ae7-f875db9a6dc0 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997893ae000, cur 1549874557 expire 1549874407 last 1549874330 [357265.488216] Lustre: Skipped 2 previous similar messages [357638.137556] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [357638.147668] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [357638.158020] Lustre: Skipped 3 previous similar messages [357890.482124] Lustre: fir-MDT0002: haven't heard from client d2f1ad7b-d247-38be-c80e-5fd306ed0938 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898038aa3c00, cur 1549875182 expire 1549875032 last 1549874955 [357890.503916] Lustre: Skipped 2 previous similar messages [358239.233897] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [358239.244008] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [358239.254361] Lustre: Skipped 3 previous similar messages [358366.493554] Lustre: fir-MDT0002: haven't heard from client f0bdd448-136d-932f-1793-5407d72b6462 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89876e76d800, cur 1549875658 expire 1549875508 last 1549875431 [358366.515363] Lustre: Skipped 2 previous similar messages [358840.330107] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [358840.340229] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [358840.350581] Lustre: Skipped 3 previous similar messages [358872.506110] Lustre: fir-MDT0000: haven't heard from client 77016c94-c604-ad13-9b84-bbf441a3043c (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897235292800, cur 1549876164 expire 1549876014 last 1549875937 [358872.527920] Lustre: Skipped 2 previous similar messages [359369.518334] Lustre: fir-MDT0002: haven't heard from client c48f0410-9307-d7ff-8701-741433c8b30e (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898425bd1800, cur 1549876661 expire 1549876511 last 1549876434 [359369.540128] Lustre: Skipped 2 previous similar messages [359441.426596] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [359441.436705] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [359441.447067] Lustre: Skipped 3 previous similar messages [359919.533797] Lustre: fir-MDT0002: haven't heard from client b6a06598-c994-b92a-c784-00404c0160f2 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8972fe2dd400, cur 1549877211 expire 1549877061 last 1549876984 [359919.555603] Lustre: Skipped 2 previous similar messages [360042.523503] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [360042.533612] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [360042.543981] Lustre: Skipped 3 previous similar messages [360526.548409] Lustre: fir-MDT0002: haven't heard from client 155ed07e-8556-3158-5af0-db26220d8ae1 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89844f6ef000, cur 1549877818 expire 1549877668 last 1549877591 [360526.570221] Lustre: Skipped 2 previous similar messages [360643.621011] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [360643.631130] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [360643.641490] Lustre: Skipped 6 previous similar messages [361113.563374] Lustre: fir-MDT0002: haven't heard from client 2f0954da-a213-cb3a-28fa-395cb0b3a457 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89664eb34000, cur 1549878405 expire 1549878255 last 1549878178 [361113.585169] Lustre: Skipped 2 previous similar messages [361244.718547] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [361244.728659] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [361827.581225] Lustre: fir-MDT0002: haven't heard from client b17149e0-f4d0-7861-8f04-293d4bf02102 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89816ba66400, cur 1549879119 expire 1549878969 last 1549878892 [361827.603035] Lustre: Skipped 2 previous similar messages [361845.810975] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [361845.821090] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [361845.831433] Lustre: Skipped 3 previous similar messages [361903.583267] Lustre: fir-MDT0002: haven't heard from client 3dbbb061-d93c-c7e8-88c8-e262ff513397 (at 10.8.14.6@o2ib6) in 211 seconds. I think it's dead, and I am evicting it. exp ffff89661925ac00, cur 1549879195 expire 1549879045 last 1549878984 [361903.604989] Lustre: Skipped 2 previous similar messages [362376.601128] Lustre: fir-MDT0000: haven't heard from client 79c829d7-b69b-d592-0b18-ba944f3d3d45 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89840cbcd000, cur 1549879668 expire 1549879518 last 1549879441 [362376.622943] Lustre: Skipped 2 previous similar messages [362446.908055] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [362446.918166] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [362446.928529] Lustre: Skipped 3 previous similar messages [362901.606812] Lustre: fir-MDT0000: haven't heard from client 7b79d0e2-7d96-494a-23c5-d664fb94353a (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898674261000, cur 1549880193 expire 1549880043 last 1549879966 [362901.628624] Lustre: Skipped 2 previous similar messages [363048.004596] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [363048.014709] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [363048.025053] Lustre: Skipped 6 previous similar messages [363335.617798] Lustre: fir-MDT0002: haven't heard from client 7939e273-a44d-c342-0c2f-14af86d31167 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89775f7c5c00, cur 1549880627 expire 1549880477 last 1549880400 [363335.639601] Lustre: Skipped 2 previous similar messages [363649.100837] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [363649.110956] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [363649.121306] Lustre: Skipped 3 previous similar messages [363899.631851] Lustre: fir-MDT0002: haven't heard from client a09d7781-bfe6-b29a-d4a6-237ce4594fd1 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895d3aea0400, cur 1549881191 expire 1549881041 last 1549880964 [363899.653658] Lustre: Skipped 2 previous similar messages [364250.197016] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [364250.207129] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [364250.217473] Lustre: Skipped 3 previous similar messages [364385.644089] Lustre: fir-MDT0002: haven't heard from client dba19b0d-a73a-9666-ab00-14e2c4360649 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895d4b640800, cur 1549881677 expire 1549881527 last 1549881450 [364385.665902] Lustre: Skipped 2 previous similar messages [364851.294282] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [364851.304391] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [364851.314755] Lustre: Skipped 3 previous similar messages [364893.658875] Lustre: fir-MDT0002: haven't heard from client e5cf781a-b2b2-298e-962f-c2788920b59d (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89732fea4400, cur 1549882185 expire 1549882035 last 1549881958 [364893.680671] Lustre: Skipped 2 previous similar messages [365452.391104] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [365452.401219] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [365452.411582] Lustre: Skipped 3 previous similar messages [365944.684994] Lustre: fir-MDT0002: haven't heard from client ce537b46-fded-c277-fe11-f1c21a81918b (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895d362d1000, cur 1549883236 expire 1549883086 last 1549883009 [365944.706786] Lustre: Skipped 2 previous similar messages [366053.488522] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [366053.498635] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [366654.580180] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [366654.590289] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [366654.600635] Lustre: Skipped 3 previous similar messages [366949.708835] Lustre: fir-MDT0002: haven't heard from client 43c2ca0b-6906-1031-26a5-d79023293574 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984d2351400, cur 1549884241 expire 1549884091 last 1549884014 [366949.730639] Lustre: Skipped 2 previous similar messages [367255.677804] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [367255.687919] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [367255.698298] Lustre: Skipped 3 previous similar messages [367856.775095] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [367856.785204] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [368457.865954] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [368457.876066] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [368660.752081] Lustre: fir-MDT0002: haven't heard from client 245a049e-9b3f-8f27-db3b-53ec4ff00d6a (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896df16f8800, cur 1549885952 expire 1549885802 last 1549885725 [368660.773886] Lustre: Skipped 2 previous similar messages [369058.957560] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [369058.967696] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [369058.978070] Lustre: Skipped 3 previous similar messages [369551.417657] Lustre: 56069:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549886835/real 1549886835] req@ffff89774e6e2a00 x1624927237237184/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549886842 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [369551.444856] Lustre: 56069:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [369572.454190] Lustre: 56069:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549886856/real 1549886856] req@ffff89774e6e2a00 x1624927237237184/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549886863 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [369572.481375] Lustre: 56069:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [369607.492067] Lustre: 56069:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549886891/real 1549886891] req@ffff89774e6e2a00 x1624927237237184/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549886898 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [369607.519249] Lustre: 56069:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [369660.053919] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [369660.064030] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [369677.532826] Lustre: 56069:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549886961/real 1549886961] req@ffff89774e6e2a00 x1624927237237184/t0(0) o106->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549886968 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [369677.560011] Lustre: 56069:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [369698.574105] LustreError: 56069:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.8@o2ib6) returned error from glimpse AST (req@ffff89774e6e2a00 x1624927237237184 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8973bbf745c0/0x1a4b7ac545c8a19b lrc: 4/0,0 mode: PW/PW res: [0x2000036fe:0x114:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.9.8@o2ib6 remote: 0xf62a009831cb0c39 expref: 35 pid: 63808 timeout: 0 lvb_type: 0 [369698.616285] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.8@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 [369698.628655] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 1150s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8973bbf745c0/0x1a4b7ac545c8a19b lrc: 4/0,0 mode: PW/PW res: [0x2000036fe:0x114:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.9.8@o2ib6 remote: 0xf62a009831cb0c39 expref: 36 pid: 63808 timeout: 0 lvb_type: 0 [369761.779680] Lustre: fir-MDT0002: haven't heard from client 6b80a9a6-a6c1-58c4-7b30-59f57c9c0b92 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995eeea4000, cur 1549887053 expire 1549886903 last 1549886826 [369761.801316] Lustre: Skipped 2 previous similar messages [369837.781026] Lustre: fir-MDT0002: haven't heard from client 975ea4f2-c7a8-67b2-2e53-2ddb935cc0e9 (at 10.8.11.22@o2ib6) in 165 seconds. I think it's dead, and I am evicting it. exp ffff89762b764400, cur 1549887129 expire 1549886979 last 1549886964 [369837.802838] Lustre: Skipped 1 previous similar message [369899.783187] Lustre: fir-MDT0000: haven't heard from client 975ea4f2-c7a8-67b2-2e53-2ddb935cc0e9 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896cf9a70400, cur 1549887191 expire 1549887041 last 1549886964 [370261.145089] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [370261.155206] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [370261.165577] Lustre: Skipped 6 previous similar messages [370862.241474] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [370862.251580] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [371463.332426] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [371463.342538] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [372064.423782] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [372064.433895] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [372369.845709] Lustre: fir-MDT0002: haven't heard from client 66c67ebd-4385-f2d6-e9cf-17c29bb89b59 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89739e3d5c00, cur 1549889661 expire 1549889511 last 1549889434 [372369.867517] Lustre: Skipped 1 previous similar message [372665.515359] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [372665.525470] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [372665.535837] Lustre: Skipped 3 previous similar messages [373128.863666] Lustre: fir-MDT0002: haven't heard from client bf5df4b8-a017-ea27-0157-f62592da601e (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898659b32800, cur 1549890420 expire 1549890270 last 1549890193 [373128.885482] Lustre: Skipped 2 previous similar messages [373266.612789] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [373266.622903] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [373266.633253] Lustre: Skipped 6 previous similar messages [373426.871894] Lustre: fir-MDT0002: haven't heard from client 562454be-07d4-c729-4bb3-fd16316c6cee (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89622ab61400, cur 1549890718 expire 1549890568 last 1549890491 [373426.893759] Lustre: Skipped 2 previous similar messages [373783.880734] Lustre: fir-MDT0002: haven't heard from client 997acf0a-3566-b8fa-04be-72609eea8dc5 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896f99b5ac00, cur 1549891075 expire 1549890925 last 1549890848 [373783.902519] Lustre: Skipped 2 previous similar messages [373867.711237] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [373867.721359] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [373867.731742] Lustre: Skipped 6 previous similar messages [374260.894090] Lustre: fir-MDT0000: haven't heard from client ec03dc16-94b2-39d3-b7d1-e95a92e3789a (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89869d60a400, cur 1549891552 expire 1549891402 last 1549891325 [374260.915879] Lustre: Skipped 2 previous similar messages [374468.807989] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [374468.818100] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [374468.828448] Lustre: Skipped 3 previous similar messages [374551.899596] Lustre: fir-MDT0000: haven't heard from client 949d0515-82aa-4143-6c9a-7b928a2d5095 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897096334800, cur 1549891843 expire 1549891693 last 1549891616 [374551.921385] Lustre: Skipped 2 previous similar messages [375069.904713] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [375069.914829] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [375069.925178] Lustre: Skipped 6 previous similar messages [375498.922971] Lustre: fir-MDT0002: haven't heard from client 6ef69a54-2164-5a52-b2ce-fca07a329ed2 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89974328a000, cur 1549892790 expire 1549892640 last 1549892563 [375498.944761] Lustre: Skipped 5 previous similar messages [375671.001204] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [375671.011327] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [375671.021669] Lustre: Skipped 3 previous similar messages [375812.930995] Lustre: fir-MDT0002: haven't heard from client 4b318a91-1fd8-81e7-7fc0-cf4bd9d78675 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898485714400, cur 1549893104 expire 1549892954 last 1549892877 [375812.952785] Lustre: Skipped 2 previous similar messages [375888.932807] Lustre: fir-MDT0002: haven't heard from client 9809bfd1-827f-49ac-2741-68b8739a163c (at 10.8.18.35@o2ib6) in 169 seconds. I think it's dead, and I am evicting it. exp ffff897feaa0bc00, cur 1549893180 expire 1549893030 last 1549893011 [375888.954619] Lustre: Skipped 2 previous similar messages [375964.935085] Lustre: fir-MDT0002: haven't heard from client 028d6433-9e7d-1b84-c8b7-1bb2a8570ec4 (at 10.8.1.4@o2ib6) in 201 seconds. I think it's dead, and I am evicting it. exp ffff8997fa31a000, cur 1549893256 expire 1549893106 last 1549893055 [375964.956719] Lustre: Skipped 2 previous similar messages [376272.097760] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [376272.107884] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [376272.118229] Lustre: Skipped 3 previous similar messages [376576.950093] Lustre: fir-MDT0002: haven't heard from client 84597a88-1b66-94f8-9b1f-2be05a0adb1f (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997cbb7c000, cur 1549893868 expire 1549893718 last 1549893641 [376576.971905] Lustre: Skipped 2 previous similar messages [376840.957173] Lustre: fir-MDT0002: haven't heard from client d863f6a6-22b5-d6d1-9e85-e87fb7a2072d (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896407ea8400, cur 1549894132 expire 1549893982 last 1549893905 [376840.978992] Lustre: Skipped 2 previous similar messages [376873.196054] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [376873.206170] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [376873.216523] Lustre: Skipped 3 previous similar messages [377474.291535] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [377474.301647] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [377474.311995] Lustre: Skipped 3 previous similar messages [378075.387392] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [378075.397503] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [378562.000766] Lustre: fir-MDT0002: haven't heard from client d2d1ad32-0bae-d301-940f-331201d0cf4d (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89779277c400, cur 1549895853 expire 1549895703 last 1549895626 [378562.022577] Lustre: Skipped 2 previous similar messages [378676.479676] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [378676.489803] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [379133.015538] Lustre: fir-MDT0002: haven't heard from client a897fd4a-55d1-835c-1959-c63e18a93dfc (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997f9b44c00, cur 1549896424 expire 1549896274 last 1549896197 [379133.037349] Lustre: Skipped 2 previous similar messages [379277.574193] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [379277.584307] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [379277.594659] Lustre: Skipped 3 previous similar messages [379635.027918] Lustre: fir-MDT0002: haven't heard from client 17407aee-6ef6-fd16-00b0-40f05e3e5420 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897597f64800, cur 1549896926 expire 1549896776 last 1549896699 [379635.049710] Lustre: Skipped 2 previous similar messages [379878.671762] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [379878.681875] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [379878.692225] Lustre: Skipped 6 previous similar messages [380479.770130] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [380479.780238] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [381080.862280] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [381080.872393] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [381681.954925] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [381681.965054] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [381681.975417] Lustre: Skipped 3 previous similar messages [381861.083607] Lustre: fir-MDT0002: haven't heard from client 2cd9229e-45de-6d19-ab59-71fcf9a392fe (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896dbe758000, cur 1549899152 expire 1549899002 last 1549898925 [381861.105412] Lustre: Skipped 2 previous similar messages [382053.088562] Lustre: fir-MDT0002: haven't heard from client c22e763b-2712-8624-a4bb-1c3145d32fd9 (at 10.8.1.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997fc293400, cur 1549899344 expire 1549899194 last 1549899117 [382053.110265] Lustre: Skipped 2 previous similar messages [382283.051417] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [382283.061526] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [382283.071872] Lustre: Skipped 3 previous similar messages [382884.147698] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [382884.157806] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [383355.121527] Lustre: fir-MDT0000: haven't heard from client da5d744a-3254-d5e6-8df6-4eacf76116cd (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8981867d6000, cur 1549900646 expire 1549900496 last 1549900419 [383355.143337] Lustre: Skipped 2 previous similar messages [383485.240858] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [383485.250970] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [383485.261315] Lustre: Skipped 3 previous similar messages [383557.126148] Lustre: fir-MDT0000: haven't heard from client 03a37b2d-0fa0-08c5-eb22-42785e11e92d (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89957c364800, cur 1549900848 expire 1549900698 last 1549900621 [383557.147961] Lustre: Skipped 2 previous similar messages [383989.136768] Lustre: fir-MDT0000: haven't heard from client a58b2d33-e272-47e2-16a2-0e5c418cbe27 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896ffebc8000, cur 1549901280 expire 1549901130 last 1549901053 [383989.158557] Lustre: Skipped 2 previous similar messages [384086.337162] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [384086.347273] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [384086.357635] Lustre: Skipped 6 previous similar messages [384261.146629] Lustre: MGS: haven't heard from client 75fb711f-a7e9-fab1-385e-131f37ed0d89 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89833826e400, cur 1549901552 expire 1549901402 last 1549901325 [384261.167747] Lustre: Skipped 2 previous similar messages [384456.148111] Lustre: fir-MDT0002: haven't heard from client ab1f55c6-25d8-b2cf-818d-4cc69ca36dd0 (at 10.8.22.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896184a48c00, cur 1549901747 expire 1549901597 last 1549901520 [384456.169922] Lustre: Skipped 2 previous similar messages [384609.152056] Lustre: fir-MDT0000: haven't heard from client 4b8453cd-aefc-d81d-671f-39121770f943 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897011e5f400, cur 1549901900 expire 1549901750 last 1549901673 [384609.173866] Lustre: Skipped 2 previous similar messages [384631.298092] Lustre: 54716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549901915/real 1549901915] req@ffff898256bf6300 x1624927420125824/t0(0) o104->fir-MDT0000@10.8.18.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549901922 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [384631.325455] Lustre: 54716:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [384652.335607] Lustre: 54716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549901936/real 1549901936] req@ffff898256bf6300 x1624927420125824/t0(0) o104->fir-MDT0000@10.8.18.35@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549901943 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [384652.362950] Lustre: 54716:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [384685.153855] Lustre: fir-MDT0000: haven't heard from client eedb9fc4-eb23-5c2e-0221-01d21528a876 (at 10.8.18.35@o2ib6) in 164 seconds. I think it's dead, and I am evicting it. exp ffff898481f36000, cur 1549901976 expire 1549901826 last 1549901812 [384685.175646] Lustre: Skipped 2 previous similar messages [384687.432990] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [384687.443113] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [384687.453473] Lustre: Skipped 6 previous similar messages [385139.165941] Lustre: fir-MDT0000: haven't heard from client ebc348ae-5a1a-6bcc-1254-ed696c20d527 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8958042b3000, cur 1549902430 expire 1549902280 last 1549902203 [385139.187746] Lustre: Skipped 2 previous similar messages [385288.530190] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [385288.540303] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [385288.550667] Lustre: Skipped 9 previous similar messages [385599.177998] Lustre: fir-MDT0000: haven't heard from client 37bf9c8c-52f6-ddcc-ad24-ef4d27fc2542 (at 10.8.1.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997a8f0c000, cur 1549902890 expire 1549902740 last 1549902663 [385599.199707] Lustre: Skipped 2 previous similar messages [385889.627538] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [385889.637661] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [385889.648041] Lustre: Skipped 3 previous similar messages [386490.724961] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [386490.735071] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [387091.816393] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [387091.826525] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [387103.214119] Lustre: fir-MDT0000: haven't heard from client c6dcbb20-86ec-247f-bf25-b411ffa56f82 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997b2210800, cur 1549904394 expire 1549904244 last 1549904167 [387103.235907] Lustre: Skipped 5 previous similar messages [387692.907856] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [387692.917970] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [387692.928331] Lustre: Skipped 3 previous similar messages [387781.231948] Lustre: fir-MDT0000: haven't heard from client 460ae552-4e39-8e7b-d7f5-5ba3d2e4cf2e (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897007fbbc00, cur 1549905072 expire 1549904922 last 1549904845 [387781.253757] Lustre: Skipped 2 previous similar messages [388294.005154] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [388294.015292] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [388294.025679] Lustre: Skipped 3 previous similar messages [388895.101936] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [388895.112064] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [388935.260800] Lustre: fir-MDT0000: haven't heard from client 71866cd3-749e-3d62-ea2c-232151da7f87 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896589b85400, cur 1549906226 expire 1549906076 last 1549905999 [388935.282610] Lustre: Skipped 2 previous similar messages [389496.192580] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [389496.202687] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [389496.213033] Lustre: Skipped 3 previous similar messages [389927.286310] Lustre: fir-MDT0000: haven't heard from client 9b65e72c-312e-566d-e836-8c153a61b356 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984e174a400, cur 1549907218 expire 1549907068 last 1549906991 [389927.308114] Lustre: Skipped 2 previous similar messages [390097.288937] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [390097.299050] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [390698.379060] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [390698.389172] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [390698.399537] Lustre: Skipped 3 previous similar messages [391165.318738] Lustre: MGS: haven't heard from client 9cd249fe-7138-cfeb-4d45-a002e91676eb (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896b55237000, cur 1549908456 expire 1549908306 last 1549908229 [391165.339835] Lustre: Skipped 2 previous similar messages [391241.322908] Lustre: fir-MDT0002: haven't heard from client d2b48667-6db8-1759-8de9-7b0d205e399c (at 10.8.18.35@o2ib6) in 160 seconds. I think it's dead, and I am evicting it. exp ffff89702bad7000, cur 1549908532 expire 1549908382 last 1549908372 [391241.344700] Lustre: Skipped 2 previous similar messages [391299.475239] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [391299.485351] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [391299.495705] Lustre: Skipped 6 previous similar messages [391317.320160] Lustre: fir-MDT0000: haven't heard from client 5c84c86a-dd88-5e2f-b69b-7cfa44d838f1 (at 10.8.8.30@o2ib6) in 191 seconds. I think it's dead, and I am evicting it. exp ffff896c22741c00, cur 1549908608 expire 1549908458 last 1549908417 [391317.341863] Lustre: Skipped 2 previous similar messages [391628.327792] Lustre: fir-MDT0000: haven't heard from client 7f433a33-ccc8-11e9-ad5b-6ddd60ac1bc5 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896b5925cc00, cur 1549908919 expire 1549908769 last 1549908692 [391628.349587] Lustre: Skipped 5 previous similar messages [391820.332657] Lustre: fir-MDT0000: haven't heard from client e72081dd-9ee7-f445-971a-456ce1cd7c85 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8967dbd87000, cur 1549909111 expire 1549908961 last 1549908884 [391820.354467] Lustre: Skipped 2 previous similar messages [391896.335183] Lustre: fir-MDT0000: haven't heard from client 7747e685-e761-7cc3-41b6-aa87888ca308 (at 10.8.18.35@o2ib6) in 220 seconds. I think it's dead, and I am evicting it. exp ffff89642de91c00, cur 1549909187 expire 1549909037 last 1549908967 [391896.356972] Lustre: Skipped 2 previous similar messages [391900.571662] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [391900.581772] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [391900.592142] Lustre: Skipped 18 previous similar messages [392307.344761] Lustre: fir-MDT0000: haven't heard from client c18f5619-f40c-54ba-42bb-990b753448df (at 10.8.10.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89607dabb400, cur 1549909598 expire 1549909448 last 1549909371 [392307.366573] Lustre: Skipped 2 previous similar messages [392383.346908] Lustre: fir-MDT0000: haven't heard from client 6ac883fb-a075-ce29-9340-b3f63fbb31e4 (at 10.8.11.22@o2ib6) in 208 seconds. I think it's dead, and I am evicting it. exp ffff896b55a00400, cur 1549909674 expire 1549909524 last 1549909466 [392383.368702] Lustre: Skipped 2 previous similar messages [392459.348580] Lustre: fir-MDT0000: haven't heard from client d007d714-686e-031f-faaf-29f5963c32fd (at 10.8.24.24@o2ib6) in 162 seconds. I think it's dead, and I am evicting it. exp ffff896b4a720c00, cur 1549909750 expire 1549909600 last 1549909588 [392459.370369] Lustre: Skipped 2 previous similar messages [392501.668530] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [392501.678646] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [392501.689025] Lustre: Skipped 24 previous similar messages [392924.360747] Lustre: fir-MDT0000: haven't heard from client a921c892-9c64-1065-f2af-c6f82088d146 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896b4a28b800, cur 1549910215 expire 1549910065 last 1549909988 [392924.382559] Lustre: Skipped 5 previous similar messages [393102.765579] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [393102.775693] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [393102.786061] Lustre: Skipped 15 previous similar messages [393269.368953] Lustre: fir-MDT0002: haven't heard from client 6a38ff8b-b3fe-82fa-fe0d-464544c990e9 (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f9864ec00, cur 1549910560 expire 1549910410 last 1549910333 [393269.390744] Lustre: Skipped 8 previous similar messages [393703.862765] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [393703.872881] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [393703.883233] Lustre: Skipped 18 previous similar messages [393892.384657] Lustre: fir-MDT0002: haven't heard from client d7b5cf6d-76c4-c724-0a7d-d27d9c5017d1 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8970c5ee2c00, cur 1549911183 expire 1549911033 last 1549910956 [393892.406454] Lustre: Skipped 17 previous similar messages [394304.959873] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [394304.969983] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [394304.980330] Lustre: Skipped 16 previous similar messages [394494.402094] Lustre: fir-MDT0000: haven't heard from client 5692b870-3177-39c9-0cae-1965e22d6f5f (at 10.8.24.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896de5b72400, cur 1549911785 expire 1549911635 last 1549911558 [394494.423888] Lustre: Skipped 8 previous similar messages [394906.056985] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [394906.067096] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [394906.077441] Lustre: Skipped 9 previous similar messages [395507.154109] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [395507.164230] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [395832.433425] Lustre: fir-MDT0000: haven't heard from client f8486c01-856a-fa2e-40e0-963cab9548e7 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975006a3000, cur 1549913123 expire 1549912973 last 1549912896 [395832.455218] Lustre: Skipped 11 previous similar messages [396026.453029] Lustre: 54715:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549913310/real 1549913310] req@ffff8962ffa8bc00 x1624927571437552/t0(0) o106->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549913317 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [396026.480388] Lustre: 54715:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [396033.490201] Lustre: 54715:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549913317/real 1549913317] req@ffff8962ffa8bc00 x1624927571437552/t0(0) o106->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549913324 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [396047.517558] Lustre: 54715:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549913331/real 1549913331] req@ffff8962ffa8bc00 x1624927571437552/t0(0) o106->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549913338 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [396047.544912] Lustre: 54715:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [396068.555081] Lustre: 54715:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549913352/real 1549913352] req@ffff8962ffa8bc00 x1624927571437552/t0(0) o106->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549913359 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [396068.582414] Lustre: 54715:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [396102.496937] Lustre: 54708:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549913386/real 1549913386] req@ffff897272755d00 x1624927572052800/t0(0) o106->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549913393 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [396102.524296] Lustre: 54708:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages [396108.245337] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [396108.255445] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [396108.265791] Lustre: Skipped 6 previous similar messages [396166.598545] Lustre: 54715:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549913450/real 1549913450] req@ffff8962ffa8bc00 x1624927571437552/t0(0) o106->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549913457 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [396166.625917] Lustre: 54715:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 29 previous similar messages [396219.900874] LNet: Service thread pid 54715 was inactive for 200.44s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [396219.917901] Pid: 54715, comm: mdt00_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [396219.927726] Call Trace: [396219.930273] [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] [396219.936980] [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [396219.943739] [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] [396219.950618] [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] [396219.956948] [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] [396219.963705] [] mdt_intent_glimpse+0x1f/0x30 [mdt] [396219.970182] [] mdt_intent_policy+0x2e8/0xd00 [mdt] [396219.976758] [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [396219.983607] [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [396219.990816] [] tgt_enqueue+0x62/0x210 [ptlrpc] [396219.997069] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [396220.004110] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [396220.011929] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [396220.018360] [] kthread+0xd1/0xe0 [396220.023377] [] ret_from_fork_nospec_begin+0xe/0x21 [396220.029962] [] 0xffffffffffffffff [396220.035076] LustreError: dumping log to /tmp/lustre-log.1549913510.54715 [396231.443311] Lustre: fir-MDT0002: haven't heard from client 3a1c20a5-0875-7243-a0de-f331a7ef0347 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89632f759c00, cur 1549913522 expire 1549913372 last 1549913295 [396231.465116] Lustre: Skipped 4 previous similar messages [396231.470707] LNet: Service thread pid 54715 completed after 212.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [396588.453417] Lustre: fir-MDT0002: haven't heard from client f361400c-7251-efb6-0ef8-526e70c50c93 (at 10.8.30.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896619259000, cur 1549913879 expire 1549913729 last 1549913652 [396588.475224] Lustre: Skipped 5 previous similar messages [396709.342686] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [396709.352801] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [396709.363166] Lustre: Skipped 9 previous similar messages [397021.463026] Lustre: fir-MDT0002: haven't heard from client 71464f83-f435-3a33-e9d6-ef54166e95b7 (at 10.8.30.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8963c663e800, cur 1549914312 expire 1549914162 last 1549914085 [397021.484815] Lustre: Skipped 5 previous similar messages [397310.440373] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [397310.450484] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [397310.460855] Lustre: Skipped 3 previous similar messages [397911.537746] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [397911.547858] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [397911.558214] Lustre: Skipped 9 previous similar messages [398255.493969] Lustre: fir-MDT0000: haven't heard from client 1c165ad6-3d34-e17c-f84d-cefa71258e01 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89973bacd000, cur 1549915546 expire 1549915396 last 1549915319 [398255.515781] Lustre: Skipped 11 previous similar messages [398473.500657] Lustre: fir-MDT0002: haven't heard from client de7cbe59-91ed-9cdd-78f1-994518b60de9 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8987bbb3e800, cur 1549915764 expire 1549915614 last 1549915537 [398473.522449] Lustre: Skipped 2 previous similar messages [398512.635038] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [398512.645148] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [398512.655519] Lustre: Skipped 3 previous similar messages [398650.803885] Lustre: 57446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549915934/real 1549915934] req@ffff897f7f2aad00 x1624927610104048/t0(0) o106->fir-MDT0000@10.8.11.22@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549915941 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [398650.831248] Lustre: 57446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 36 previous similar messages [398671.842412] Lustre: 57446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549915955/real 1549915955] req@ffff897f7f2aad00 x1624927610104048/t0(0) o106->fir-MDT0000@10.8.11.22@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549915962 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [398671.869752] Lustre: 57446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [398706.882291] Lustre: 57446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549915990/real 1549915990] req@ffff897f7f2aad00 x1624927610104048/t0(0) o106->fir-MDT0000@10.8.11.22@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1549915997 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [398706.909658] Lustre: 57446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [398719.506305] Lustre: fir-MDT0002: haven't heard from client 04474950-fa62-afa2-d07b-1c911223f7bd (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89967328d800, cur 1549916010 expire 1549915860 last 1549915783 [398719.528095] Lustre: Skipped 2 previous similar messages [399022.532602] Lustre: MGS: haven't heard from client be351485-d98a-de42-73d7-9e662ffd5c0d (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f7eb0c800, cur 1549916313 expire 1549916163 last 1549916086 [399022.553727] Lustre: Skipped 5 previous similar messages [399113.731485] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [399113.741593] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [399113.751964] Lustre: Skipped 9 previous similar messages [399714.828039] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [399714.838158] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [399714.848525] Lustre: Skipped 9 previous similar messages [399855.534266] Lustre: fir-MDT0002: haven't heard from client 0e96810b-d4c3-0ba3-47fc-1e6ff4eeb00f (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896c88b5b800, cur 1549917146 expire 1549916996 last 1549916919 [399855.556058] Lustre: Skipped 11 previous similar messages [400265.366409] Lustre: 63808:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549917548/real 1549917548] req@ffff896c873da700 x1624927635232896/t0(0) o104->fir-MDT0002@10.8.15.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549917555 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [400265.393664] Lustre: 63808:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [400279.404759] Lustre: 63808:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549917562/real 1549917562] req@ffff896c873da700 x1624927635232896/t0(0) o104->fir-MDT0002@10.8.15.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549917569 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [400279.432012] Lustre: 63808:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [400300.442283] Lustre: 63808:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549917583/real 1549917583] req@ffff896c873da700 x1624927635232896/t0(0) o104->fir-MDT0002@10.8.15.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549917590 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [400300.469533] Lustre: 63808:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [400315.924252] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [400315.934367] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [400315.944729] Lustre: Skipped 12 previous similar messages [400332.555094] Lustre: 54478:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549917616/real 1549917616] req@ffff8995fe6cc800 x1624927636108224/t0(0) o104->fir-MDT0002@10.8.15.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549917623 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [400332.582351] Lustre: 54478:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [400363.483884] LustreError: 63808:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.15.1@o2ib6) failed to reply to blocking AST (req@ffff896c873da700 x1624927635232896 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff898464e50000/0x1a4b7ac58f3a1ca2 lrc: 4/0,0 mode: PR/PR res: [0x2c0001745:0x92f3:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.15.1@o2ib6 remote: 0xf72f7af5e5f0f89a expref: 3046 pid: 54732 timeout: 400445 lvb_type: 0 [400363.526675] LustreError: 138-a: fir-MDT0002: A client on nid 10.8.15.1@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [400363.539218] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.8.15.1@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff898464e50000/0x1a4b7ac58f3a1ca2 lrc: 3/0,0 mode: PR/PR res: [0x2c0001745:0x92f3:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.15.1@o2ib6 remote: 0xf72f7af5e5f0f89a expref: 3047 pid: 54732 timeout: 0 lvb_type: 0 [400917.020472] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [400917.030588] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [400917.040941] Lustre: Skipped 3 previous similar messages [401044.564051] Lustre: fir-MDT0002: haven't heard from client 20b70446-f1e0-88e5-0d8a-6c82385b1f89 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997c0f2d000, cur 1549918335 expire 1549918185 last 1549918108 [401044.585861] Lustre: Skipped 10 previous similar messages [401518.116749] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [401518.126875] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [401518.137229] Lustre: Skipped 6 previous similar messages [401661.579499] Lustre: fir-MDT0002: haven't heard from client 7d45040c-87ee-f986-2ff8-b81743934a66 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89667e2dcc00, cur 1549918952 expire 1549918802 last 1549918725 [401661.601309] Lustre: Skipped 2 previous similar messages [402119.213523] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [402119.223668] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [402119.234016] Lustre: Skipped 6 previous similar messages [402266.594719] Lustre: fir-MDT0000: haven't heard from client c8d4a74b-4b84-19dd-7aee-212d1511d448 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89858beeac00, cur 1549919557 expire 1549919407 last 1549919330 [402266.616505] Lustre: Skipped 5 previous similar messages [402720.310632] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [402720.320741] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [402720.331086] Lustre: Skipped 6 previous similar messages [403147.617732] Lustre: fir-MDT0002: haven't heard from client d5e4ed35-a88c-665e-f99f-2b1c5dab2580 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8981bf2fcc00, cur 1549920438 expire 1549920288 last 1549920211 [403147.639522] Lustre: Skipped 5 previous similar messages [403321.407845] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [403321.417962] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [403922.499213] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [403922.509339] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [403922.519712] Lustre: Skipped 3 previous similar messages [404442.650298] Lustre: fir-MDT0002: haven't heard from client 9c7eabdd-d06f-2e01-efa6-a2befd4655bc (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896489b85000, cur 1549921733 expire 1549921583 last 1549921506 [404442.672091] Lustre: Skipped 5 previous similar messages [404523.596363] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [404523.606476] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [404523.616827] Lustre: Skipped 3 previous similar messages [404948.662004] Lustre: fir-MDT0000: haven't heard from client 3ce87805-d426-b064-87f2-477a120a36be (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897665af2400, cur 1549922239 expire 1549922089 last 1549922012 [404948.683794] Lustre: Skipped 2 previous similar messages [405124.693263] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [405124.703376] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [405124.713724] Lustre: Skipped 12 previous similar messages [405725.790015] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [405725.800131] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [406313.895186] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549923597/real 1549923597] req@ffff897f4eed8600 x1624927733714080/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549923604 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [406313.922369] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 22 previous similar messages [406326.881716] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [406326.891830] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [406327.932537] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549923611/real 1549923611] req@ffff897f4eed8600 x1624927733714080/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549923618 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [406327.959704] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [406348.971070] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549923632/real 1549923632] req@ffff897f4eed8600 x1624927733714080/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549923639 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [406348.998233] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [406384.009941] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549923667/real 1549923667] req@ffff897f4eed8600 x1624927733714080/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549923674 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [406384.037108] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [406454.050702] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549923737/real 1549923737] req@ffff897f4eed8600 x1624927733714080/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549923744 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [406454.077873] Lustre: 63799:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [406461.087896] LustreError: 63799:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.8@o2ib6) failed to reply to blocking AST (req@ffff897f4eed8600 x1624927733714080 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff895813803f00/0x1a4b7ac5611f7ee3 lrc: 4/0,0 mode: PR/PR res: [0x2000018a2:0xed0:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.9.8@o2ib6 remote: 0x69b435de38c3330d expref: 40 pid: 54482 timeout: 406593 lvb_type: 0 [406461.130266] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.9.8@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [406461.142738] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff895813803f00/0x1a4b7ac5611f7ee3 lrc: 3/0,0 mode: PR/PR res: [0x2000018a2:0xed0:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.9.8@o2ib6 remote: 0x69b435de38c3330d expref: 41 pid: 54482 timeout: 0 lvb_type: 0 [406507.701398] Lustre: fir-MDT0002: haven't heard from client 4a72c43c-b639-3519-2806-50e34cdc9168 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961d1f78800, cur 1549923798 expire 1549923648 last 1549923571 [406507.723017] Lustre: Skipped 8 previous similar messages [406927.972407] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [406927.982520] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [406927.992865] Lustre: Skipped 3 previous similar messages [407277.720445] Lustre: fir-MDT0000: haven't heard from client 99f1497f-9261-985d-d6ab-7dc1ba28e549 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896482390000, cur 1549924568 expire 1549924418 last 1549924341 [407277.742231] Lustre: Skipped 1 previous similar message [407529.069310] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [407529.079428] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [407529.089775] Lustre: Skipped 6 previous similar messages [407699.731065] Lustre: fir-MDT0002: haven't heard from client cf1f0424-f9b2-5872-b7a5-1226e498747c (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895c517d8000, cur 1549924990 expire 1549924840 last 1549924763 [407699.752857] Lustre: Skipped 5 previous similar messages [407717.731527] Lustre: fir-MDT0000: haven't heard from client cf1f0424-f9b2-5872-b7a5-1226e498747c (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895809529400, cur 1549925008 expire 1549924858 last 1549924781 [408047.739882] Lustre: fir-MDT0002: haven't heard from client ea48caf2-d28d-02e7-9c4c-77a3779f776e (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896cccf74800, cur 1549925338 expire 1549925188 last 1549925111 [408047.761690] Lustre: Skipped 1 previous similar message [408130.166077] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [408130.176185] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [408130.186532] Lustre: Skipped 3 previous similar messages [408548.752522] Lustre: fir-MDT0002: haven't heard from client a0e46e79-f6fb-ae91-b3d8-c3e65420d66f (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896c3af9ac00, cur 1549925839 expire 1549925689 last 1549925612 [408548.774317] Lustre: Skipped 2 previous similar messages [408624.754415] Lustre: fir-MDT0000: haven't heard from client 4415a91e-f103-8faa-ade4-ef257f23c8de (at 10.8.13.14@o2ib6) in 172 seconds. I think it's dead, and I am evicting it. exp ffff8977e352b000, cur 1549925915 expire 1549925765 last 1549925743 [408624.776220] Lustre: Skipped 2 previous similar messages [408731.263154] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [408731.273277] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [408731.283651] Lustre: Skipped 6 previous similar messages [409237.769574] Lustre: fir-MDT0000: haven't heard from client 42449457-551d-3ac6-7ea9-0f539f9122fa (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896b14f64c00, cur 1549926528 expire 1549926378 last 1549926301 [409237.791366] Lustre: Skipped 2 previous similar messages [409332.360367] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [409332.370499] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [409332.380896] Lustre: Skipped 3 previous similar messages [409933.457310] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [409933.467425] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [409933.477779] Lustre: Skipped 3 previous similar messages [410034.789591] Lustre: fir-MDT0002: haven't heard from client 9529402e-4e0a-2200-8f1b-ce619cee0112 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8976d2392000, cur 1549927325 expire 1549927175 last 1549927098 [410034.811401] Lustre: Skipped 2 previous similar messages [410534.554372] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [410534.564485] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [410534.574833] Lustre: Skipped 3 previous similar messages [410782.808378] Lustre: fir-MDT0000: haven't heard from client f78bab40-3614-0037-9cc4-01415716e174 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896b2de5bc00, cur 1549928073 expire 1549927923 last 1549927846 [410782.830223] Lustre: Skipped 2 previous similar messages [411135.651171] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [411135.661284] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [411135.671662] Lustre: Skipped 6 previous similar messages [411200.725820] Lustre: 54581:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549928479/real 1549928479] req@ffff896a11ab3000 x1624927812835968/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549928490 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [411200.753184] Lustre: 54581:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [411222.763382] Lustre: 54581:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549928501/real 1549928501] req@ffff896a11ab3000 x1624927812835968/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549928512 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [411222.790747] Lustre: 54581:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [411255.802201] Lustre: 54581:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549928534/real 1549928534] req@ffff896a11ab3000 x1624927812835968/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549928545 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [411255.829538] Lustre: 54581:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [411321.841862] Lustre: 54581:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549928601/real 1549928601] req@ffff896a11ab3000 x1624927812835968/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549928612 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [411321.869200] Lustre: 54581:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [411343.879437] LustreError: 54581:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.13.14@o2ib6) failed to reply to blocking AST (req@ffff896a11ab3000 x1624927812835968 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8973416772c0/0x1a4b7ac6221529d4 lrc: 4/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 89 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0x26fdc5dda798366e expref: 11 pid: 63808 timeout: 411471 lvb_type: 0 [411343.922222] LustreError: 138-a: fir-MDT0000: A client on nid 10.8.13.14@o2ib6 was evicted due to a lock blocking callback time out: rc -110 [411343.934845] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.13.14@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8973416772c0/0x1a4b7ac6221529d4 lrc: 3/0,0 mode: PR/PR res: [0x200000406:0x4a8:0x0].0x0 bits 0x13/0x0 rrc: 89 type: IBT flags: 0x60200400000020 nid: 10.8.13.14@o2ib6 remote: 0x26fdc5dda798366e expref: 12 pid: 63808 timeout: 0 lvb_type: 0 [411375.823226] Lustre: fir-MDT0000: haven't heard from client 8a98a266-b012-7ebf-085d-3c980fd5a763 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896a143ff400, cur 1549928666 expire 1549928516 last 1549928439 [411375.845011] Lustre: Skipped 5 previous similar messages [411736.748006] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [411736.758116] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [411736.768469] Lustre: Skipped 6 previous similar messages [412066.840674] Lustre: fir-MDT0002: haven't heard from client bce9232d-db13-620c-6248-ef924e4d0bfa (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89771cafec00, cur 1549929357 expire 1549929207 last 1549929130 [412066.862485] Lustre: Skipped 4 previous similar messages [412157.738806] Lustre: Setting parameter -OST0000.obdfilter.fir-OST*.brw_size in log params [412196.298758] Lustre: Modifying parameter osc.fir-OST*.osc.max_pages_per_rpc in log params [412337.844909] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [412337.855019] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [412337.865365] Lustre: Skipped 3 previous similar messages [412635.854940] Lustre: fir-MDT0002: haven't heard from client f61dd066-e93b-85e7-7f83-41d77a244dbd (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899685648000, cur 1549929926 expire 1549929776 last 1549929699 [412635.876728] Lustre: Skipped 2 previous similar messages [412938.941783] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [412938.951889] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [412938.962254] Lustre: Skipped 6 previous similar messages [413540.038781] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [413540.048888] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [413602.879181] Lustre: fir-MDT0000: haven't heard from client f936710e-e7d8-651c-2b1b-7cc8a68ad371 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896191a86800, cur 1549930893 expire 1549930743 last 1549930666 [413602.900993] Lustre: Skipped 5 previous similar messages [413833.885545] Lustre: fir-MDT0002: haven't heard from client 32decdc2-1b4d-1897-17b3-aaf5745d906d (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898297e0b800, cur 1549931124 expire 1549930974 last 1549930897 [413833.907333] Lustre: Skipped 2 previous similar messages [414141.129665] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [414141.139782] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [414141.150152] Lustre: Skipped 6 previous similar messages [414385.898873] Lustre: fir-MDT0002: haven't heard from client 4f7f13ab-0c77-335e-41a8-25ce073ce0aa (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89580956dc00, cur 1549931676 expire 1549931526 last 1549931449 [414385.920682] Lustre: Skipped 2 previous similar messages [414573.903497] Lustre: fir-MDT0000: haven't heard from client 2d279b6b-ae49-37ac-0a12-0938de9dc4ca (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8983dd647000, cur 1549931864 expire 1549931714 last 1549931637 [414573.925195] Lustre: Skipped 2 previous similar messages [414742.226757] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [414742.236872] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [414742.247241] Lustre: Skipped 6 previous similar messages [415278.707694] LustreError: 54249:0:(fld_handler.c:225:fld_local_lookup()) srv-fir-MDT0000: FLD cache range [0x00000008c0000400-0x0000000900000400]:4:ost does not match requested flag 0: rc = -5 [415278.724807] LustreError: 54249:0:(fld_handler.c:264:fld_server_lookup()) srv-fir-MDT0000: Cannot find sequence 0x8c0000400: rc = -2 [415343.323801] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [415343.333915] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [415569.928474] Lustre: fir-MDT0000: haven't heard from client 3a75fbf7-aacb-f40c-a9e2-9a6f76b1e968 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984e76ef400, cur 1549932860 expire 1549932710 last 1549932633 [415569.950279] Lustre: Skipped 2 previous similar messages [415944.414863] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [415944.424973] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [415944.435324] Lustre: Skipped 6 previous similar messages [416024.939953] Lustre: fir-MDT0000: haven't heard from client 5ca6ac72-6275-f0a0-7b87-639e081868ad (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8960da3ca800, cur 1549933315 expire 1549933165 last 1549933088 [416024.961740] Lustre: Skipped 5 previous similar messages [416545.511887] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [416545.522007] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [416545.532376] Lustre: Skipped 3 previous similar messages [417146.608895] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [417146.619015] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [417364.973556] Lustre: fir-MDT0000: haven't heard from client a8449bff-56f7-51ba-5adc-6abf8e452713 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896c34fb3800, cur 1549934655 expire 1549934505 last 1549934428 [417364.995343] Lustre: Skipped 2 previous similar messages [417747.699965] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [417747.710074] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [417747.720426] Lustre: Skipped 3 previous similar messages [417850.985776] Lustre: fir-MDT0000: haven't heard from client 93c777d8-19c8-6356-41d2-9bba7f65eb8f (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899610b6fc00, cur 1549935141 expire 1549934991 last 1549934914 [417851.007566] Lustre: Skipped 2 previous similar messages [418124.985600] Lustre: 56073:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549935407/real 1549935407] req@ffff8994c02f0600 x1624927939075552/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549935414 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [418125.012942] Lustre: 56073:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [418146.023133] Lustre: 56073:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549935429/real 1549935429] req@ffff8994c02f0600 x1624927939075552/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549935436 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [418146.050471] Lustre: 56073:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [418181.062010] Lustre: 56073:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549935464/real 1549935464] req@ffff8994c02f0600 x1624927939075552/t0(0) o104->fir-MDT0000@10.8.13.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549935471 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [418181.089360] Lustre: 56073:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [418250.995735] Lustre: fir-MDT0000: haven't heard from client 4716fe2e-b162-9071-e50d-24885b29871a (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961c8ecd000, cur 1549935541 expire 1549935391 last 1549935314 [418251.017528] Lustre: Skipped 2 previous similar messages [418326.997658] Lustre: fir-MDT0000: haven't heard from client 670fc807-dd14-9be9-2373-4bbdc84964c5 (at 10.8.11.22@o2ib6) in 218 seconds. I think it's dead, and I am evicting it. exp ffff8962b2b13000, cur 1549935617 expire 1549935467 last 1549935399 [418327.019451] Lustre: Skipped 2 previous similar messages [418335.043235] Lustre: MGS: haven't heard from client 3aa0d2bd-246b-20ea-fbda-92cbd4d23306 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899612b53400, cur 1549935625 expire 1549935475 last 1549935398 [418335.064338] Lustre: Skipped 1 previous similar message [418348.797070] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [418348.807196] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [418348.817558] Lustre: Skipped 3 previous similar messages [418814.009959] Lustre: fir-MDT0002: haven't heard from client 266a2d9e-b528-ee9e-45f8-a1ce8f70560a (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896589ae2c00, cur 1549936104 expire 1549935954 last 1549935877 [418949.894112] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [418949.904221] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [418949.914568] Lustre: Skipped 9 previous similar messages [419221.020079] Lustre: fir-MDT0002: haven't heard from client e617b155-aa3a-8262-58f4-1d9ab77dbd2c (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896972af9c00, cur 1549936511 expire 1549936361 last 1549936284 [419221.041886] Lustre: Skipped 2 previous similar messages [419550.991205] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [419551.001316] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [419551.011661] Lustre: Skipped 3 previous similar messages [419716.032784] Lustre: fir-MDT0002: haven't heard from client b48cb1cb-71f0-84ae-e350-3e2bfee85624 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896967b81c00, cur 1549937006 expire 1549936856 last 1549936779 [419716.054572] Lustre: Skipped 2 previous similar messages [420152.088289] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [420152.098431] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [420152.108803] Lustre: Skipped 3 previous similar messages [420216.045052] Lustre: fir-MDT0002: haven't heard from client f425d37e-c125-4f6b-235a-7086b8c4eff9 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896966e86800, cur 1549937506 expire 1549937356 last 1549937279 [420216.066844] Lustre: Skipped 2 previous similar messages [420674.056834] Lustre: fir-MDT0000: haven't heard from client 4ae4897a-8bdc-7a34-cbef-1e528133e141 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89695332c400, cur 1549937964 expire 1549937814 last 1549937737 [420674.078623] Lustre: Skipped 2 previous similar messages [420753.185322] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [420753.195434] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [420753.205783] Lustre: Skipped 3 previous similar messages [421048.066042] Lustre: fir-MDT0000: haven't heard from client d7ea7861-69ee-e29c-6aa8-d1128e6f908a (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897647696400, cur 1549938338 expire 1549938188 last 1549938111 [421048.087833] Lustre: Skipped 2 previous similar messages [421354.282385] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [421354.292501] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [421354.302853] Lustre: Skipped 8 previous similar messages [421955.379526] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [421955.389638] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [422252.096205] Lustre: fir-MDT0000: haven't heard from client ecc6f109-15ce-20bc-40e5-4957d4ec1dab (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e11ed7c00, cur 1549939542 expire 1549939392 last 1549939315 [422252.118019] Lustre: Skipped 2 previous similar messages [422556.471571] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [422556.481689] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [422556.492040] Lustre: Skipped 3 previous similar messages [422701.107419] Lustre: fir-MDT0002: haven't heard from client b6d9ac42-6245-c8c8-28e4-f4f90666c36d (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896961e16400, cur 1549939991 expire 1549939841 last 1549939764 [422701.129206] Lustre: Skipped 4 previous similar messages [423157.568569] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [423157.578684] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [423157.589034] Lustre: Skipped 3 previous similar messages [423235.120827] Lustre: fir-MDT0000: haven't heard from client a6f3ed12-cfa6-928e-c47b-b2decb830de0 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8962fcba1000, cur 1549940525 expire 1549940375 last 1549940298 [423235.142630] Lustre: Skipped 2 previous similar messages [423684.132094] Lustre: fir-MDT0002: haven't heard from client 9375d0c9-c5ee-bf70-22e3-e96d55a1c3f7 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89691bb77c00, cur 1549940974 expire 1549940824 last 1549940747 [423684.153926] Lustre: Skipped 2 previous similar messages [423758.665555] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [423758.675670] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [423758.686044] Lustre: Skipped 3 previous similar messages [424359.762643] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [424359.772755] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [424359.783118] Lustre: Skipped 6 previous similar messages [424498.153536] Lustre: fir-MDT0000: haven't heard from client f841bf45-d0c7-2331-4e02-f9aac83d70db (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8968ff298c00, cur 1549941788 expire 1549941638 last 1549941561 [424498.175323] Lustre: Skipped 5 previous similar messages [424960.859634] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [424960.869746] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [424960.880111] Lustre: Skipped 3 previous similar messages [425425.176772] Lustre: fir-MDT0002: haven't heard from client e1d674ee-a2b9-c264-88c9-2f1a865d5b09 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895d06b0d800, cur 1549942715 expire 1549942565 last 1549942488 [425425.198585] Lustre: Skipped 5 previous similar messages [425465.087588] Lustre: Setting parameter fir-*.obdfilter.fir-*.readcache_max_filesize in log params [425561.956586] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [425561.966698] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [425561.977065] Lustre: Skipped 6 previous similar messages [426163.053531] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [426163.063643] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [426764.144477] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [426764.154589] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [427360.225698] Lustre: fir-MDT0000: haven't heard from client b4cdb74d-1373-4e5e-0458-2fab1774f617 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896900f7c400, cur 1549944650 expire 1549944500 last 1549944423 [427360.247502] Lustre: Skipped 2 previous similar messages [427365.235655] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [427365.245814] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [427610.798205] LustreError: 55256:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0000: BRW to missing obj [0x2000036d2:0xc8:0x0] [427832.236708] Lustre: fir-MDT0000: haven't heard from client 5049d3eb-3c1a-afa0-eb07-45fcb7b6a2b7 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8968f2328c00, cur 1549945122 expire 1549944972 last 1549944895 [427832.258497] Lustre: Skipped 2 previous similar messages [427966.327819] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [427966.337932] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [427966.348278] Lustre: Skipped 6 previous similar messages [428224.247402] Lustre: fir-MDT0002: haven't heard from client c8e59f32-a41f-125e-b0df-9091342dd2b5 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8969377db800, cur 1549945514 expire 1549945364 last 1549945287 [428224.269190] Lustre: Skipped 2 previous similar messages [428567.425217] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [428567.435329] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [428567.445696] Lustre: Skipped 3 previous similar messages [428727.258676] Lustre: fir-MDT0002: haven't heard from client 1132224d-7fa8-98e5-e8a6-1c7847fb622d (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89676915e800, cur 1549946017 expire 1549945867 last 1549945790 [428727.280488] Lustre: Skipped 2 previous similar messages [429168.522472] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [429168.532581] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [429168.542945] Lustre: Skipped 3 previous similar messages [429758.284536] Lustre: fir-MDT0000: haven't heard from client eb9f1238-5a7e-bbe3-ed9b-13d4a55a928d (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961a5e86400, cur 1549947048 expire 1549946898 last 1549946821 [429758.306348] Lustre: Skipped 5 previous similar messages [429769.619672] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [429769.629788] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [429769.640140] Lustre: Skipped 3 previous similar messages [430370.716808] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [430370.726924] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [430370.737273] Lustre: Skipped 3 previous similar messages [430689.308454] Lustre: fir-MDT0002: haven't heard from client 5e8a09e9-5518-63e3-4864-b26fd545b7b6 (at 10.8.15.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896be7af0800, cur 1549947979 expire 1549947829 last 1549947752 [430689.330171] Lustre: Skipped 5 previous similar messages [430971.813373] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [430971.823495] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [430971.833843] Lustre: Skipped 6 previous similar messages [431572.910039] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [431572.920155] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [431572.930506] Lustre: Skipped 3 previous similar messages [431623.331424] Lustre: fir-MDT0000: haven't heard from client dba63adb-6a8e-6cd9-54a6-c84c12865402 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89697a7a5400, cur 1549948913 expire 1549948763 last 1549948686 [431623.353216] Lustre: Skipped 8 previous similar messages [432174.006655] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [432174.016789] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [432174.027140] Lustre: Skipped 9 previous similar messages [432370.350561] Lustre: fir-MDT0002: haven't heard from client 61f112c2-83b6-0e81-d768-2d21fef76a1a (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d9578f400, cur 1549949660 expire 1549949510 last 1549949433 [432370.372356] Lustre: Skipped 8 previous similar messages [432775.103262] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [432775.113369] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [432775.123715] Lustre: Skipped 6 previous similar messages [433132.370150] Lustre: MGS: haven't heard from client 88ed1fdb-bf3e-296b-b717-76e992c4946a (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975e8780400, cur 1549950422 expire 1549950272 last 1549950195 [433132.391266] Lustre: Skipped 11 previous similar messages [433376.199797] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [433376.209922] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [433376.220294] Lustre: Skipped 9 previous similar messages [433977.296245] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [433977.306365] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [433977.316745] Lustre: Skipped 8 previous similar messages [434578.392647] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [434578.402758] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [435111.418847] Lustre: fir-MDT0000: haven't heard from client 240cabb7-5523-93ec-39a6-6984f05dd981 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896d53b3b400, cur 1549952401 expire 1549952251 last 1549952174 [435111.440641] Lustre: Skipped 8 previous similar messages [435179.484068] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [435179.494178] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [435566.430383] Lustre: fir-MDT0002: haven't heard from client 55625a39-821d-1d3c-a0f3-d15d0bc47864 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977e606d000, cur 1549952856 expire 1549952706 last 1549952629 [435566.452195] Lustre: Skipped 4 previous similar messages [435780.575303] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [435780.585414] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [435780.595768] Lustre: Skipped 6 previous similar messages [436381.672039] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [436381.682162] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [436903.463929] Lustre: fir-MDT0000: haven't heard from client c49e9713-be58-954f-804e-7a03164cd71b (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d7f64e000, cur 1549954193 expire 1549954043 last 1549953966 [436903.485723] Lustre: Skipped 2 previous similar messages [436982.764518] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [436982.774633] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [437358.475267] Lustre: fir-MDT0000: haven't heard from client 7f4c0d2b-87b2-4817-11fc-35386bfabbe0 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8961bda6f400, cur 1549954648 expire 1549954498 last 1549954421 [437358.497058] Lustre: Skipped 2 previous similar messages [437583.857089] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [437583.867211] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [437583.877581] Lustre: Skipped 6 previous similar messages [437611.709731] LustreError: 55226:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0000: BRW to missing obj [0x200003745:0x943:0x0] [437754.485263] Lustre: fir-MDT0000: haven't heard from client ee4f158b-fd50-0cd9-0490-f2444de640f2 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896760535c00, cur 1549955044 expire 1549954894 last 1549954817 [437754.507057] Lustre: Skipped 2 previous similar messages [438184.954641] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [438184.964759] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [438184.975137] Lustre: Skipped 3 previous similar messages [438215.496828] Lustre: fir-MDT0002: haven't heard from client 19cf490f-03ee-9fc2-04c3-2c1eae55f005 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896208726400, cur 1549955505 expire 1549955355 last 1549955278 [438215.518624] Lustre: Skipped 2 previous similar messages [438680.508515] Lustre: fir-MDT0002: haven't heard from client 6a792197-aa42-2a4b-1f22-f0b77af45173 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e536c1400, cur 1549955970 expire 1549955820 last 1549955743 [438680.530326] Lustre: Skipped 2 previous similar messages [438786.051594] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [438786.061709] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [438786.072078] Lustre: Skipped 6 previous similar messages [439082.518810] Lustre: fir-MDT0000: haven't heard from client 73de8ba2-b841-92c8-7916-67aed8a86373 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984506b5c00, cur 1549956372 expire 1549956222 last 1549956145 [439082.540600] Lustre: Skipped 5 previous similar messages [439387.148423] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [439387.158539] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [439387.168896] Lustre: Skipped 6 previous similar messages [439988.245054] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [439988.255169] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [439988.265523] Lustre: Skipped 3 previous similar messages [440311.549734] Lustre: fir-MDT0002: haven't heard from client ce363aa6-1c10-0fe8-eab0-7cf3f41dfcfa (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896aa32f7400, cur 1549957601 expire 1549957451 last 1549957374 [440311.571521] Lustre: Skipped 5 previous similar messages [440589.341453] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [440589.351571] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [440589.361939] Lustre: Skipped 3 previous similar messages [440911.564722] Lustre: fir-MDT0000: haven't heard from client fe6d5948-0153-0ad3-9cf7-388515c5d491 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896e2d6d1000, cur 1549958201 expire 1549958051 last 1549957974 [440911.586511] Lustre: Skipped 2 previous similar messages [441190.437736] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [441190.447843] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [441190.458191] Lustre: Skipped 3 previous similar messages [441552.580620] Lustre: fir-MDT0002: haven't heard from client faecc65b-a876-9b84-42cb-8899b70ca835 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896d2da14c00, cur 1549958842 expire 1549958692 last 1549958615 [441552.602414] Lustre: Skipped 2 previous similar messages [441791.534006] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [441791.544122] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [441791.554488] Lustre: Skipped 3 previous similar messages [442025.592508] Lustre: fir-MDT0002: haven't heard from client fada419e-92f7-ae40-7a86-9cc984da6453 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d69a05000, cur 1549959315 expire 1549959165 last 1549959088 [442025.614315] Lustre: Skipped 2 previous similar messages [442392.630275] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [442392.640386] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [442392.650740] Lustre: Skipped 6 previous similar messages [442787.611536] Lustre: fir-MDT0000: haven't heard from client 2edc7408-e52b-f959-749f-bfa6ab1e5d05 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d3bb83400, cur 1549960077 expire 1549959927 last 1549959850 [442787.633328] Lustre: Skipped 8 previous similar messages [442993.726712] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [442993.736820] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [442993.747184] Lustre: Skipped 6 previous similar messages [443594.823761] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [443594.833874] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [443841.638027] Lustre: fir-MDT0002: haven't heard from client 2899a640-4f40-d14e-bd56-b5cdb0787778 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896e5f787000, cur 1549961131 expire 1549960981 last 1549960904 [443841.659817] Lustre: Skipped 2 previous similar messages [444195.915249] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [444195.925364] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [444195.935714] Lustre: Skipped 3 previous similar messages [444797.012824] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [444797.022942] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [445280.676703] Lustre: fir-MDT0002: haven't heard from client df7d381f-505f-c790-6c38-35b6b178c234 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896962bac000, cur 1549962570 expire 1549962420 last 1549962343 [445280.698495] Lustre: Skipped 2 previous similar messages [445398.104213] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [445398.114325] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [445999.195403] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [445999.205518] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [445999.215863] Lustre: Skipped 3 previous similar messages [446008.523438] LustreError: 55226:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0000: BRW to missing obj [0x20000177f:0x1d11:0x0] [446025.693829] Lustre: fir-MDT0000: haven't heard from client a3fed4c5-b2e1-5d47-a49f-3582a3c3f169 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995f32b7000, cur 1549963315 expire 1549963165 last 1549963088 [446025.715619] Lustre: Skipped 2 previous similar messages [446445.704769] Lustre: fir-MDT0000: haven't heard from client c7d0dd45-27eb-d5ba-bf0c-4953ab9273ee (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d0d777000, cur 1549963735 expire 1549963585 last 1549963508 [446445.726562] Lustre: Skipped 2 previous similar messages [446600.292011] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [446600.302131] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [446600.312496] Lustre: Skipped 6 previous similar messages [446767.713072] Lustre: MGS: haven't heard from client 2573d144-a04e-d458-e49d-3e61b2eae92f (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89735aa96800, cur 1549964057 expire 1549963907 last 1549963830 [446767.734188] Lustre: Skipped 2 previous similar messages [447150.722048] Lustre: fir-MDT0002: haven't heard from client ae8c7931-eb93-9db6-aac4-6598d9a5d7a3 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f6fef0000, cur 1549964440 expire 1549964290 last 1549964213 [447150.743682] Lustre: Skipped 2 previous similar messages [447201.388566] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [447201.398679] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [447201.409023] Lustre: Skipped 6 previous similar messages [447226.724947] Lustre: fir-MDT0002: haven't heard from client facf6d73-ad5b-7135-db39-0467530bcc24 (at 10.8.13.14@o2ib6) in 189 seconds. I think it's dead, and I am evicting it. exp ffff8957c13e5400, cur 1549964516 expire 1549964366 last 1549964327 [447226.746732] Lustre: Skipped 2 previous similar messages [447456.728655] Lustre: fir-MDT0000: haven't heard from client 275a07e3-5074-a2be-55af-a08dee65a2ef (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895809465800, cur 1549964746 expire 1549964596 last 1549964519 [447456.750450] Lustre: Skipped 2 previous similar messages [447802.484831] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [447802.494941] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [447802.505314] Lustre: Skipped 6 previous similar messages [447991.742069] Lustre: fir-MDT0000: haven't heard from client 12896ee4-d5d4-bf82-a2cc-3453c095ed54 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897312258c00, cur 1549965281 expire 1549965131 last 1549965054 [447991.763859] Lustre: Skipped 5 previous similar messages [448403.581102] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [448403.591217] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [448403.601569] Lustre: Skipped 6 previous similar messages [449004.677634] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [449004.687746] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [449004.698094] Lustre: Skipped 6 previous similar messages [449412.779341] Lustre: fir-MDT0002: haven't heard from client cb1d400c-8614-8c4c-5f2c-f8cee70b9bc5 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896c2a74a000, cur 1549966702 expire 1549966552 last 1549966475 [449412.801127] Lustre: Skipped 8 previous similar messages [449605.774910] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [449605.785023] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [449605.795402] Lustre: Skipped 3 previous similar messages [450013.793557] Lustre: fir-MDT0000: haven't heard from client 3e6b5098-cf07-7df3-a5e7-927e6c866e45 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89581af1c400, cur 1549967303 expire 1549967153 last 1549967076 [450013.815347] Lustre: Skipped 5 previous similar messages [450206.872317] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [450206.882432] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [450206.892783] Lustre: Skipped 9 previous similar messages [450464.804335] Lustre: fir-MDT0000: haven't heard from client 011706f6-b77b-7e53-f93d-2135edabf570 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89724875f800, cur 1549967754 expire 1549967604 last 1549967527 [450464.826119] Lustre: Skipped 5 previous similar messages [450807.969916] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [450807.980032] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [450807.990382] Lustre: Skipped 3 previous similar messages [451039.819559] Lustre: fir-MDT0000: haven't heard from client 704efce1-4cef-6d0a-20fd-970ef10d5957 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896cd86a2400, cur 1549968329 expire 1549968179 last 1549968102 [451039.841352] Lustre: Skipped 5 previous similar messages [451409.067111] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [451409.077226] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [451409.087594] Lustre: Skipped 9 previous similar messages [452010.163838] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [452010.173961] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [452010.184309] Lustre: Skipped 6 previous similar messages [452150.846442] Lustre: MGS: haven't heard from client ca476e3d-16d3-7d69-e455-6f04c413acf8 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898621676400, cur 1549969440 expire 1549969290 last 1549969213 [452150.867534] Lustre: Skipped 11 previous similar messages [452611.260147] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [452611.270259] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [452611.280628] Lustre: Skipped 3 previous similar messages [453166.872035] Lustre: fir-MDT0000: haven't heard from client 5e4dd06a-0dfe-9032-2507-679bb551c144 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896df2a2e800, cur 1549970456 expire 1549970306 last 1549970229 [453166.893821] Lustre: Skipped 5 previous similar messages [453212.356304] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [453212.366443] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [453212.376805] Lustre: Skipped 3 previous similar messages [453813.453598] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [453813.463705] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [453813.474051] Lustre: Skipped 9 previous similar messages [454066.894773] Lustre: fir-MDT0000: haven't heard from client 385af1dc-8888-80ce-a7d9-bcb50a2e781c (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8983ff68e800, cur 1549971356 expire 1549971206 last 1549971129 [454066.916594] Lustre: Skipped 11 previous similar messages [454414.550834] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [454414.560967] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [454414.571321] Lustre: Skipped 6 previous similar messages [454896.915339] Lustre: fir-MDT0000: haven't heard from client e58d9fe4-5f25-58f4-4108-3f2c88a15dcb (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984516ca400, cur 1549972186 expire 1549972036 last 1549971959 [454896.937127] Lustre: Skipped 8 previous similar messages [455015.649402] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [455015.659512] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [455015.669859] Lustre: Skipped 9 previous similar messages [455616.747133] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [455616.757247] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [455616.767594] Lustre: Skipped 6 previous similar messages [455836.942863] Lustre: fir-MDT0000: haven't heard from client 27ddd272-4261-18e0-2409-5188d4f6e834 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89663e3bbc00, cur 1549973126 expire 1549972976 last 1549972899 [455836.964652] Lustre: Skipped 11 previous similar messages [456217.845344] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [456217.855457] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [456217.865804] Lustre: Skipped 6 previous similar messages [456509.956524] Lustre: fir-MDT0000: haven't heard from client fb9f1d4d-f6a3-96f7-510b-4098ceb9c8a7 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975c2795c00, cur 1549973799 expire 1549973649 last 1549973572 [456509.978335] Lustre: Skipped 11 previous similar messages [456818.941595] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [456818.951703] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [456818.962054] Lustre: Skipped 12 previous similar messages [457420.037443] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [457420.047555] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [457420.057902] Lustre: Skipped 6 previous similar messages [457497.985678] Lustre: fir-MDT0002: haven't heard from client b1a4bbad-e34c-6175-8766-1db83179e2b2 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896a59badc00, cur 1549974787 expire 1549974637 last 1549974560 [457498.007466] Lustre: Skipped 8 previous similar messages [458021.133518] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [458021.143630] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [458021.153974] Lustre: Skipped 3 previous similar messages [458539.008525] Lustre: fir-MDT0002: haven't heard from client 67446f96-5bf5-79ff-bd25-3ae9af9957e7 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8971d5314c00, cur 1549975828 expire 1549975678 last 1549975601 [458539.030320] Lustre: Skipped 8 previous similar messages [458622.230393] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [458622.240504] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [458622.250848] Lustre: Skipped 3 previous similar messages [459223.328037] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [459223.338148] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [459223.348501] Lustre: Skipped 6 previous similar messages [459458.037978] Lustre: fir-MDT0002: haven't heard from client 27db760e-e539-8e03-8ff6-fda9186b88a3 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896e82220800, cur 1549976747 expire 1549976597 last 1549976520 [459458.059767] Lustre: Skipped 8 previous similar messages [459824.425694] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [459824.435807] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [459824.446175] Lustre: Skipped 6 previous similar messages [460280.050600] Lustre: fir-MDT0000: haven't heard from client ce789f13-bc41-259b-1230-a0fd4905bf23 (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897087f0c000, cur 1549977569 expire 1549977419 last 1549977342 [460280.072396] Lustre: Skipped 5 previous similar messages [460425.524167] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [460425.534278] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [460425.544622] Lustre: Skipped 6 previous similar messages [461026.621076] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [461026.631192] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [461026.641535] Lustre: Skipped 15 previous similar messages [461036.070079] Lustre: fir-MDT0002: haven't heard from client 5d362216-ee4c-5f7d-fbc6-13e86939c064 (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8984a5be7c00, cur 1549978325 expire 1549978175 last 1549978098 [461036.091871] Lustre: Skipped 14 previous similar messages [461627.717817] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [461627.727933] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [461627.738320] Lustre: Skipped 9 previous similar messages [461706.086303] Lustre: fir-MDT0002: haven't heard from client 251e20c4-fa34-6f89-7833-5b3eb7340176 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895c4a006800, cur 1549978995 expire 1549978845 last 1549978768 [461706.108095] Lustre: Skipped 11 previous similar messages [462228.813935] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [462228.824049] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [462228.834418] Lustre: Skipped 12 previous similar messages [462586.108824] Lustre: fir-MDT0000: haven't heard from client 23b08de2-de4c-8781-e471-209db9865d8b (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897cdd7f2800, cur 1549979875 expire 1549979725 last 1549979648 [462586.130612] Lustre: Skipped 14 previous similar messages [462829.910213] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [462829.920397] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [462829.930768] Lustre: Skipped 11 previous similar messages [463420.129604] Lustre: fir-MDT0000: haven't heard from client 861f7ff3-57ec-beb4-f6a3-ce754156bb5b (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89739fec8800, cur 1549980709 expire 1549980559 last 1549980482 [463420.151411] Lustre: Skipped 5 previous similar messages [463431.006878] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [463431.016990] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [463431.027335] Lustre: Skipped 3 previous similar messages [464032.104064] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [464032.114179] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [464032.124523] Lustre: Skipped 6 previous similar messages [464060.145971] Lustre: fir-MDT0002: haven't heard from client ecff2dc5-a268-a260-15d4-ed4304ce6a2f (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896b827a5400, cur 1549981349 expire 1549981199 last 1549981122 [464060.167782] Lustre: Skipped 5 previous similar messages [464633.202569] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [464633.212674] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [464633.223022] Lustre: Skipped 11 previous similar messages [464806.170694] Lustre: MGS: haven't heard from client 3ed32ac1-10f9-88af-4663-b81a2c171cae (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89975fbc6400, cur 1549982095 expire 1549981945 last 1549981868 [464806.191814] Lustre: Skipped 13 previous similar messages [465234.300161] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [465234.310270] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [465234.320613] Lustre: Skipped 6 previous similar messages [465574.183351] Lustre: fir-MDT0002: haven't heard from client f44e6c74-d847-d23a-51be-efa4a0dfec42 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89603eb60000, cur 1549982863 expire 1549982713 last 1549982636 [465574.205162] Lustre: Skipped 5 previous similar messages [465835.397506] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [465835.407625] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [465835.417989] Lustre: Skipped 6 previous similar messages [466215.202234] Lustre: fir-MDT0002: haven't heard from client 1d4f825b-a785-a963-43e2-422b318dd2f2 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898581e8f400, cur 1549983504 expire 1549983354 last 1549983277 [466215.224032] Lustre: Skipped 13 previous similar messages [466436.494517] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [466436.504633] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [466436.515002] Lustre: Skipped 15 previous similar messages [467008.220324] Lustre: fir-MDT0000: haven't heard from client b10b41d1-e464-5315-c29a-615aef503c54 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977e5822800, cur 1549984297 expire 1549984147 last 1549984070 [467008.242113] Lustre: Skipped 14 previous similar messages [467037.590985] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [467037.601102] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [467037.611448] Lustre: Skipped 9 previous similar messages [467638.687252] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [467638.697363] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [467638.707710] Lustre: Skipped 9 previous similar messages [467789.240590] Lustre: fir-MDT0002: haven't heard from client 615ca86a-4b1c-2253-ede2-5d0e29f69e5f (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982a7a98c00, cur 1549985078 expire 1549984928 last 1549984851 [467789.262375] Lustre: Skipped 11 previous similar messages [468239.783576] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [468239.793683] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [468239.804029] Lustre: Skipped 9 previous similar messages [468527.258081] Lustre: fir-MDT0000: haven't heard from client 24dd6e64-5aea-b7a3-d459-b81ae26089bc (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8962df112800, cur 1549985816 expire 1549985666 last 1549985589 [468527.279873] Lustre: Skipped 11 previous similar messages [468840.880155] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [468840.890267] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [468840.900619] Lustre: Skipped 9 previous similar messages [469441.977488] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [469441.987607] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [469501.283420] Lustre: fir-MDT0000: haven't heard from client cb4584a8-cf44-2afa-f807-d06b86ec340c (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8976ecab0800, cur 1549986790 expire 1549986640 last 1549986563 [469501.305210] Lustre: Skipped 2 previous similar messages [470043.069061] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [470043.079172] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [470043.089524] Lustre: Skipped 3 previous similar messages [470418.305331] Lustre: fir-MDT0002: haven't heard from client c703af01-ea2f-81b6-cba4-3f1b6304e904 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898093b0d000, cur 1549987707 expire 1549987557 last 1549987480 [470418.327116] Lustre: Skipped 2 previous similar messages [470644.166655] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [470644.176765] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [470644.187117] Lustre: Skipped 9 previous similar messages [471049.320817] Lustre: fir-MDT0002: haven't heard from client 51b34506-e67d-dd20-08ed-ebfc7d4b46ec (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974edfaf800, cur 1549988338 expire 1549988188 last 1549988111 [471049.342606] Lustre: Skipped 11 previous similar messages [471245.263968] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [471245.274077] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [471245.284439] Lustre: Skipped 6 previous similar messages [471846.361074] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [471846.371196] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [471846.381543] Lustre: Skipped 3 previous similar messages [471900.345573] Lustre: fir-MDT0002: haven't heard from client 38c1d253-f418-86b9-0824-16146062dae3 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977df7fc000, cur 1549989189 expire 1549989039 last 1549988962 [471900.367366] Lustre: Skipped 5 previous similar messages [472447.456917] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [472447.467026] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [472447.477370] Lustre: Skipped 3 previous similar messages [472637.360519] Lustre: fir-MDT0000: haven't heard from client e74cc2ee-8963-d22d-fcb9-add9dd65caea (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89734f3d8800, cur 1549989926 expire 1549989776 last 1549989699 [472637.382312] Lustre: Skipped 2 previous similar messages [473048.552637] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [473048.562751] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [473048.573095] Lustre: Skipped 3 previous similar messages [473290.377088] Lustre: fir-MDT0000: haven't heard from client b2fe8c60-05ef-730f-ba4a-242450d17c2d (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8970d471b800, cur 1549990579 expire 1549990429 last 1549990352 [473290.398901] Lustre: Skipped 8 previous similar messages [473649.649236] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [473649.659349] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [473649.669692] Lustre: Skipped 9 previous similar messages [473906.396011] Lustre: fir-MDT0000: haven't heard from client dc1ec120-8bf5-7e9a-119f-082629c4da7b (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897549fb5c00, cur 1549991195 expire 1549991045 last 1549990968 [473906.417797] Lustre: Skipped 5 previous similar messages [474250.745685] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [474250.755796] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [474250.766165] Lustre: Skipped 6 previous similar messages [474851.842065] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [474851.852194] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [474851.862544] Lustre: Skipped 3 previous similar messages [474976.419315] Lustre: fir-MDT0002: haven't heard from client 0d4f2f3d-d3b3-8237-4912-481d68c0fa91 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8960617e0400, cur 1549992265 expire 1549992115 last 1549992038 [474976.441123] Lustre: Skipped 5 previous similar messages [475452.938570] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [475452.948687] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [475452.959038] Lustre: Skipped 9 previous similar messages [475595.434846] Lustre: fir-MDT0002: haven't heard from client 5449e0a5-0d76-f4f2-56ed-e85dd13795e9 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896a02743000, cur 1549992884 expire 1549992734 last 1549992657 [475595.456639] Lustre: Skipped 8 previous similar messages [476054.034324] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [476054.044432] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [476054.054788] Lustre: Skipped 6 previous similar messages [476223.458264] Lustre: fir-MDT0000: haven't heard from client 174f480f-c55e-5b39-3618-a781391d0bc2 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898315be0800, cur 1549993512 expire 1549993362 last 1549993285 [476223.480057] Lustre: Skipped 5 previous similar messages [476655.130608] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [476655.140716] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [476655.151070] Lustre: Skipped 6 previous similar messages [476866.466789] Lustre: fir-MDT0000: haven't heard from client b81f0187-91e5-066d-74b4-50740b410ac6 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8976af3e2800, cur 1549994155 expire 1549994005 last 1549993928 [476866.488595] Lustre: Skipped 8 previous similar messages [476976.141396] Lustre: 56076:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549994257/real 1549994257] req@ffff896f87735d00 x1624928547696880/t0(0) o104->fir-MDT0000@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549994264 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [476976.168753] Lustre: 56076:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [476990.178749] Lustre: 56076:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549994271/real 1549994271] req@ffff896f87735d00 x1624928547696880/t0(0) o104->fir-MDT0000@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549994278 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [476990.206112] Lustre: 56076:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [477011.217278] Lustre: 56076:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549994292/real 1549994292] req@ffff896f87735d00 x1624928547696880/t0(0) o104->fir-MDT0000@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549994299 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [477011.244611] Lustre: 56076:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [477046.255157] Lustre: 56076:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549994327/real 1549994327] req@ffff896f87735d00 x1624928547696880/t0(0) o104->fir-MDT0000@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1549994334 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [477046.282498] Lustre: 56076:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [477256.227817] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [477256.237923] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [477256.248269] Lustre: Skipped 6 previous similar messages [477654.705427] Lustre: 57742:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549994936/real 1549994936] req@ffff898603ac3c00 x1624928552278416/t0(0) o104->fir-MDT0000@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1549994943 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [477654.732592] Lustre: 57742:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages [477664.488141] Lustre: MGS: haven't heard from client 15354d81-459e-5c30-0947-eb0f45d804ec (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89940822b400, cur 1549994953 expire 1549994803 last 1549994726 [477664.509069] Lustre: Skipped 11 previous similar messages [477857.325021] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [477857.335150] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [477857.345511] Lustre: Skipped 9 previous similar messages [478458.422397] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [478458.432505] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [478458.442846] Lustre: Skipped 6 previous similar messages [478637.512554] Lustre: MGS: haven't heard from client b49e7a28-3af9-0635-5427-13712b9c2f2b (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997efad8000, cur 1549995926 expire 1549995776 last 1549995699 [478637.533654] Lustre: Skipped 8 previous similar messages [479059.519644] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [479059.529758] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [479059.540125] Lustre: Skipped 3 previous similar messages [479660.615837] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [479660.625951] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [479660.636302] Lustre: Skipped 3 previous similar messages [480261.711713] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [480261.721833] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [480261.732210] Lustre: Skipped 3 previous similar messages [480519.558672] Lustre: fir-MDT0002: haven't heard from client cbc6bb65-d70e-8933-e7e2-42766c8bd4e0 (at 10.8.14.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896b4d2fb000, cur 1549997808 expire 1549997658 last 1549997581 [480519.580425] Lustre: Skipped 2 previous similar messages [480862.808126] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [480862.818252] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [480866.272962] LustreError: 55373:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0000: BRW to missing obj [0x200001767:0x1b32:0x0] [481113.581155] Lustre: 56378:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549998395/real 1549998395] req@ffff896f87735d00 x1624928587131984/t0(0) o105->fir-MDT0002@10.8.7.15@o2ib6:15/16 lens 304/224 e 0 to 1 dl 1549998402 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 [481113.608610] Lustre: 56378:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [481115.106195] Lustre: fir-OST001d-osc-MDT0000: Connection to fir-OST001d (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [481115.122458] Lustre: Skipped 11 previous similar messages [481201.422261] Lustre: 53955:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549998480/real 1549998480] req@ffff895b53e46600 x1624928590312912/t0(0) o13->fir-OST0001-osc-MDT0002@10.0.10.102@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1549998488 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 [481201.450550] Lustre: 53955:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [481201.460381] Lustre: fir-OST0001-osc-MDT0002: Connection to fir-OST0001 (at 10.0.10.102@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [481201.476534] Lustre: Skipped 1 previous similar message [481202.954485] Lustre: fir-OST002f-osc-MDT0002: Connection to fir-OST002f (at 10.0.10.108@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [481202.970634] Lustre: Skipped 1 previous similar message [481205.041629] Lustre: fir-OST0020-osc-MDT0002: Connection to fir-OST0020 (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [481205.057788] Lustre: Skipped 3 previous similar messages [481205.560351] Lustre: 58964:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff897e447e3300 x1625285054527584/t0(0) o35->c3ee8e29-24b2-60ad-b950-c5ea318742ba@10.8.17.29@o2ib6:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [481211.811842] Lustre: fir-OST002a-osc-MDT0000: Connection to fir-OST002a (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [481211.827994] Lustre: Skipped 12 previous similar messages [481245.412858] Lustre: 53917:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549998523/real 1549998526] req@ffff89642bbd4e00 x1624928590681584/t0(0) o13->fir-OST001c-osc-MDT0000@10.0.10.105@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1549998531 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 [481245.441165] Lustre: 53917:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 24 previous similar messages [481245.451087] Lustre: fir-OST001c-osc-MDT0000: Connection to fir-OST001c (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [481245.467233] Lustre: Skipped 5 previous similar messages [481266.282453] Lustre: fir-OST000e-osc-MDT0002: Connection to fir-OST000e (at 10.0.10.103@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [481266.298610] Lustre: Skipped 12 previous similar messages [481266.468835] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.210@o2ib7: 14 seconds [481266.479180] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 215 previous similar messages [481273.854335] Lustre: 54097:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (45:1s); client may timeout. req@ffff896cdcae9450 x1624697972802320/t0(0) o103->7030631e-b3d2-9eed-f765-9117cb5ba8a4@10.9.103.35@o2ib4:171/0 lens 328/0 e 0 to 0 dl 1549998561 ref 2 fl Interpret:H/0/ffffffff rc 0/-1 [481273.883528] LustreError: 54097:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.102.25@o2ib4: deadline 45:1s ago req@ffff896cedf8a700 x1624674455706128/t0(0) o103->7d6292c2-dc0a-0082-5273-c1ff8e6163ed@10.9.102.25@o2ib4:171/0 lens 328/0 e 0 to 0 dl 1549998561 ref 2 fl Interpret:H/0/ffffffff rc 0/-1 [481274.577894] Lustre: 56386:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (45:1s); client may timeout. req@ffff8973d2210300 x1624674455710752/t0(0) o103->7d6292c2-dc0a-0082-5273-c1ff8e6163ed@10.9.102.25@o2ib4:172/0 lens 328/192 e 0 to 0 dl 1549998562 ref 2 fl Complete:H/0/0 rc 0/0 [481274.591797] LustreError: 54091:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.102.25@o2ib4: deadline 45:1s ago req@ffff8973d2213c00 x1624674455710928/t0(0) o103->7d6292c2-dc0a-0082-5273-c1ff8e6163ed@10.9.102.25@o2ib4:172/0 lens 328/0 e 0 to 0 dl 1549998562 ref 2 fl Interpret:H/0/ffffffff rc 0/-1 [481274.638636] Lustre: 56386:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 241 previous similar messages [481275.194162] Lustre: 56355:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff896c9df3c500 x1624674455713840/t0(0) o103->7d6292c2-dc0a-0082-5273-c1ff8e6163ed@10.9.102.25@o2ib4:172/0 lens 328/0 e 0 to 0 dl 1549998562 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 [481276.231358] Lustre: 54099:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (45:2s); client may timeout. req@ffff89723d361500 x1624674455710512/t0(0) o103->7d6292c2-dc0a-0082-5273-c1ff8e6163ed@10.9.102.25@o2ib4:172/0 lens 328/192 e 0 to 0 dl 1549998562 ref 1 fl Complete:H/0/0 rc 0/0 [481276.259912] Lustre: 54099:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 15 previous similar messages [481277.862861] Lustre: 56359:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff897257753600 x1624674455771664/t0(0) o103->7d6292c2-dc0a-0082-5273-c1ff8e6163ed@10.9.102.25@o2ib4:173/0 lens 328/0 e 0 to 0 dl 1549998563 ref 2 fl New:H/0/ffffffff rc 0/-1 [481277.894014] Lustre: 56359:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 369 previous similar messages [481279.034131] LustreError: 55292:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff896c55f6a450 x1624753318356464/t0(0) o4->557e9a9e-3aaf-5df1-d787-57a5a1219217@10.8.3.31@o2ib6:219/0 lens 504/448 e 0 to 0 dl 1549998609 ref 1 fl Interpret:/0/0 rc 0/0 [481279.058193] Lustre: fir-MDT0002: Bulk IO write error with 557e9a9e-3aaf-5df1-d787-57a5a1219217 (at 10.8.3.31@o2ib6), client will retry: rc = -110 [481281.237152] LustreError: 54087:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.102.25@o2ib4: deadline 46:6s ago req@ffff898755aa1e00 x1624674455706640/t0(0) o103->7d6292c2-dc0a-0082-5273-c1ff8e6163ed@10.9.102.25@o2ib4:173/0 lens 328/0 e 0 to 0 dl 1549998563 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 [481281.265423] Lustre: 56375:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (46:6s); client may timeout. req@ffff89810e671500 x1624674455707712/t0(0) o103->7d6292c2-dc0a-0082-5273-c1ff8e6163ed@10.9.102.25@o2ib4:173/0 lens 328/0 e 0 to 0 dl 1549998563 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 [481281.265427] Lustre: 56375:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message [481281.308600] LustreError: 54087:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 187 previous similar messages [481290.961730] Lustre: 56359:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (45:16s); client may timeout. req@ffff8977d2b12700 x1624674455770064/t0(0) o103->7d6292c2-dc0a-0082-5273-c1ff8e6163ed@10.9.102.25@o2ib4:173/0 lens 328/0 e 0 to 0 dl 1549998563 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 [481290.990973] Lustre: 56359:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 437 previous similar messages [481292.805148] Lustre: MGS: Received new LWP connection from 10.9.104.40@o2ib4, removing former export from same NID [481293.929435] Lustre: MGS: Received new LWP connection from 10.9.113.10@o2ib4, removing former export from same NID [481295.015158] Lustre: MGS: Received new LWP connection from 10.9.107.9@o2ib4, removing former export from same NID [481295.025427] Lustre: Skipped 7 previous similar messages [481297.335236] Lustre: MGS: Received new LWP connection from 10.9.102.9@o2ib4, removing former export from same NID [481297.345492] Lustre: Skipped 14 previous similar messages [481299.861337] Lustre: fir-OST0027-osc-MDT0002: Connection to fir-OST0027 (at 10.0.10.108@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [481299.877495] Lustre: Skipped 10 previous similar messages [481303.885333] LustreError: 55216:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(69632) req@ffff896319b2d050 x1624748387623504/t0(0) o4->25262cb2-e449-1554-b5d1-0b6e448154f1@10.9.106.18@o2ib4:215/0 lens 488/448 e 1 to 0 dl 1549998605 ref 1 fl Interpret:/0/0 rc 0/0 [481303.910268] Lustre: fir-MDT0002: Bulk IO write error with 25262cb2-e449-1554-b5d1-0b6e448154f1 (at 10.9.106.18@o2ib4), client will retry: rc = -110 [481304.197077] Lustre: MGS: Received new LWP connection from 10.9.103.14@o2ib4, removing former export from same NID [481304.207423] Lustre: Skipped 22 previous similar messages [481306.007185] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff898095399b00/0x1a4b7ac73a73a2db lrc: 3/0,0 mode: PR/PR res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 110763 type: IBT flags: 0x60000400010020 nid: 10.9.103.35@o2ib4 remote: 0x8299540b6fa0e5a5 expref: 117590 pid: 54716 timeout: 481293 lvb_type: 0 [481309.289995] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff898170233600/0x1a4b7ac73a72a17f lrc: 3/0,0 mode: PR/PR res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 110764 type: IBT flags: 0x60000400010020 nid: 10.9.103.35@o2ib4 remote: 0x8299540b6fa0c9a5 expref: 117584 pid: 56066 timeout: 481296 lvb_type: 0 [481309.466106] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff899632e9ec00 x1624928590914144/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481309.874938] Lustre: 53952:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549998590/real 1549998591] req@ffff8993faec8600 x1624928590830000/t0(0) o13->fir-OST0016-osc-MDT0002@10.0.10.103@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1549998597 ref 1 fl Rpc:RX/2/ffffffff rc 0/-1 [481309.904237] Lustre: 53952:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 34 previous similar messages [481310.177969] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff899632e99800 x1624928590914448/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481310.199148] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages [481310.291526] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 101s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff89751b2445c0/0x1a4b7ac73a72d947 lrc: 3/0,0 mode: PR/PR res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 110763 type: IBT flags: 0x60000400010020 nid: 10.9.103.35@o2ib4 remote: 0x8299540b6fa0cf32 expref: 117581 pid: 54716 timeout: 481296 lvb_type: 0 [481310.330305] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 172 previous similar messages [481311.519274] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff899632e99200 x1624928590914624/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481311.540454] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message [481312.292565] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 103s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8987b86aa880/0x1a4b7ac73a7341de lrc: 3/0,0 mode: PR/PR res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 110760 type: IBT flags: 0x60000400010020 nid: 10.9.103.35@o2ib4 remote: 0x8299540b6fa0db3a expref: 117578 pid: 56066 timeout: 481296 lvb_type: 0 [481312.307579] LustreError: 54368:0:(osp_precreate.c:940:osp_precreate_cleanup_orphans()) fir-OST0016-osc-MDT0002: cannot cleanup orphans: rc = -11 [481312.344365] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 343 previous similar messages [481313.301798] Lustre: MGS: Received new LWP connection from 10.9.106.6@o2ib4, removing former export from same NID [481313.312055] Lustre: Skipped 22 previous similar messages [481313.642406] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff899632e9b600 x1624928590922224/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481313.663585] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 6 previous similar messages [481316.320266] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff89961aaad580/0x1a4b7ac73a6f769e lrc: 3/0,0 mode: PR/PR res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 110758 type: IBT flags: 0x60000400010020 nid: 10.9.103.35@o2ib4 remote: 0x8299540b6fa07979 expref: 117576 pid: 54706 timeout: 481303 lvb_type: 0 [481316.359046] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 516 previous similar messages [481318.708901] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff899632e9b300 x1624928590945600/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481323.446570] Lustre: 54843:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff89842ff76850 x1624928590928800/t0(0) o400->fir-MDT0000-mdtlov_UUID@0@lo:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [481323.468354] Lustre: 54843:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 10 previous similar messages [481324.324022] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 102s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8994373ca1c0/0x1a4b7ac73a6a7ba9 lrc: 3/0,0 mode: PR/PR res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 110608 type: IBT flags: 0x60000400010020 nid: 10.9.103.35@o2ib4 remote: 0x8299540b6f9fff54 expref: 117425 pid: 54706 timeout: 481309 lvb_type: 0 [481324.362793] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 415 previous similar messages [481325.657093] Lustre: 54044:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:53s); client may timeout. req@ffff896d85778300 x1624752468943280/t0(0) o103->1ee3037c-52dd-207d-3196-b589ce5ac006@10.9.114.14@o2ib4:171/0 lens 328/0 e 0 to 0 dl 1549998561 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [481325.686163] Lustre: 54044:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 10 previous similar messages [481326.684151] LNet: Service thread pid 56062 was inactive for 201.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [481326.701176] Pid: 56062, comm: mdt01_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [481326.711003] Call Trace: [481326.713563] [] __cond_resched+0x26/0x30 [481326.719167] [] __kmalloc+0x55/0x230 [481326.724417] [] null_alloc_repbuf+0x140/0x2d0 [ptlrpc] [481326.731291] [] sptlrpc_cli_alloc_repbuf+0x125/0x200 [ptlrpc] [481326.738763] [] ptl_send_rpc+0x9bb/0xe70 [ptlrpc] [481326.745183] [] ptlrpc_send_new_req+0x450/0xa60 [ptlrpc] [481326.752213] [] ptlrpc_set_add_req+0x84/0x100 [ptlrpc] [481326.759067] [] ldlm_server_blocking_ast+0x7a8/0xa40 [ptlrpc] [481326.766521] [] tgt_blocking_ast+0x159/0x630 [ptlrpc] [481326.773301] [] ldlm_work_bl_ast_lock+0x11c/0x300 [ptlrpc] [481326.780494] [] ptlrpc_check_set.part.23+0x600/0x1df0 [ptlrpc] [481326.788034] [] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] [481326.794631] [] ptlrpc_set_wait+0x537/0x8d0 [ptlrpc] [481326.801302] [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [481326.808063] [] ldlm_handle_conflict_lock+0x70/0x320 [ptlrpc] [481326.815515] [] ldlm_lock_enqueue+0x2e3/0xa60 [ptlrpc] [481326.822363] [] ldlm_cli_enqueue_local+0x1cc/0x870 [ptlrpc] [481326.829642] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [481326.836560] [] mdt_object_lock_internal+0x70/0x3e0 [mdt] [481326.843640] [] mdt_object_lock+0x20/0x30 [mdt] [481326.849854] [] mdt_brw_enqueue+0x44b/0x760 [mdt] [481326.856251] [] mdt_intent_brw+0x1f/0x30 [mdt] [481326.862378] [] mdt_intent_policy+0x2e8/0xd00 [mdt] [481326.868939] [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [481326.875785] [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [481326.882978] [] tgt_enqueue+0x62/0x210 [ptlrpc] [481326.889228] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [481326.896255] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [481326.904070] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [481326.910487] [] kthread+0xd1/0xe0 [481326.915479] [] ret_from_fork_nospec_begin+0xe/0x21 [481326.922040] [] 0xffffffffffffffff [481326.927148] LustreError: dumping log to /tmp/lustre-log.1549998615.56062 [481327.945902] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff899632e9bf00 x1624928590968880/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481327.967078] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 15 previous similar messages [481328.589391] LustreError: 55155:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(40960) req@ffff89839a320050 x1624669890325488/t0(0) o4->c3ba6e64-2791-36bf-c905-71dc1f9569f2@10.9.101.10@o2ib4:234/0 lens 488/448 e 1 to 0 dl 1549998624 ref 1 fl Interpret:/0/0 rc 0/0 [481328.614295] LustreError: 55155:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) Skipped 1 previous similar message [481328.624051] Lustre: fir-MDT0002: Bulk IO write error with c3ba6e64-2791-36bf-c905-71dc1f9569f2 (at 10.9.101.10@o2ib4), client will retry: rc = -110 [481328.637340] Lustre: Skipped 1 previous similar message [481329.352623] NMI watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [ldlm_bl_03:55962] [481329.360536] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481329.433535] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481329.467218] CPU: 28 PID: 55962 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481329.479634] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481329.487289] task: ffff8997eeacd140 ti: ffff8997fb308000 task.ti: ffff8997fb308000 [481329.494855] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x19f/0x440 [ptlrpc] [481329.505271] RSP: 0018:ffff8997fb30bbb0 EFLAGS: 00000246 [481329.510671] RAX: 0000000000000001 RBX: ffff89752aeabd20 RCX: ffff89752aeabd20 [481329.517891] RDX: ffff8997fb30bca8 RSI: ffff8995f523f740 RDI: ffff89656a2a60c0 [481329.525109] RBP: ffff8997fb30bc08 R08: ffff8997fb30bca8 R09: 00000000c0005f35 [481329.532331] R10: 0000000000000035 R11: ffff8995f523f740 R12: ffff8997fb30bca8 [481329.539550] R13: 00000000c0005f35 R14: 0000000000000035 R15: ffff8995f523f740 [481329.546770] FS: 00007fb564ea5700(0000) GS:ffff8967fefc0000(0000) knlGS:0000000000000000 [481329.554942] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481329.560775] CR2: 00007fc038f6a000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481329.567993] Call Trace: [481329.570575] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481329.578179] [] ? ldlm_export_lock_object+0x10/0x10 [ptlrpc] [481329.585519] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481329.593377] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481329.600457] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481329.607531] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481329.615912] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481329.624035] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481329.630999] [] ? wake_up_state+0x20/0x20 [481329.636688] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481329.644169] [] kthread+0xd1/0xe0 [481329.649133] [] ? insert_kthread_work+0x40/0x40 [481329.655315] [] ret_from_fork_nospec_begin+0xe/0x21 [481329.661840] [] ? insert_kthread_work+0x40/0x40 [481329.668018] Code: 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 74 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 00 00 <49> 81 ed 10 02 00 00 4d 39 fd 75 d5 31 f6 48 8b 45 c0 48 39 45 [481333.077537] Lustre: 8631:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s req@ffff898327ba0450 x1624736713331008/t0(0) o400->a0da43e6-c6eb-c71e-8468-9ff62fbf9bd7@10.8.22.34@o2ib6:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [481333.171715] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [migration/7:47] [481333.179368] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481333.252371] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481333.286052] CPU: 7 PID: 47 Comm: migration/7 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481333.298208] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481333.305863] task: ffff8988a9fc5140 ti: ffff8988a9ff8000 task.ti: ffff8988a9ff8000 [481333.313428] RIP: 0010:[] [] multi_cpu_stop+0x4a/0x110 [481333.321802] RSP: 0000:ffff8988a9ffbd98 EFLAGS: 00000246 [481333.327200] RAX: 0000000000000001 RBX: ffff8988a9619800 RCX: dead000000000200 [481333.334420] RDX: ffff89983f455ff0 RSI: 0000000000000282 RDI: ffff8987d03cbab0 [481333.341639] RBP: ffff8988a9ffbdc0 R08: ffff8987d03cba50 R09: 0000000000000001 [481333.348859] R10: 000000000000babf R11: 0000000000000001 R12: ffff8988a9ffbd20 [481333.356078] R13: ffff8967d9775140 R14: ffffffff9d6d2eb2 R15: ffff8988a9ffbd00 [481333.363298] FS: 00007fa72ea77700(0000) GS:ffff89983f440000(0000) knlGS:0000000000000000 [481333.371471] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481333.377304] CR2: 00007fa720095060 CR3: 000000300da98000 CR4: 00000000003407e0 [481333.384522] Call Trace: [481333.387065] [] ? cpu_stop_should_run+0x50/0x50 [481333.393243] [] cpu_stopper_thread+0x99/0x150 [481333.399250] [] ? __schedule+0x3ff/0x890 [481333.404823] [] smpboot_thread_fn+0x144/0x1a0 [481333.410827] [] ? lg_double_unlock+0x40/0x40 [481333.416748] [] kthread+0xd1/0xe0 [481333.421714] [] ? insert_kthread_work+0x40/0x40 [481333.427894] [] ret_from_fork_nospec_begin+0xe/0x21 [481333.434419] [] ? insert_kthread_work+0x40/0x40 [481333.440597] Code: 66 90 66 90 49 89 c5 48 8b 47 18 48 85 c0 0f 84 b3 00 00 00 0f a3 18 19 db 85 db 41 0f 95 c6 45 31 ff 31 c0 0f 1f 44 00 00 f3 90 <41> 8b 5c 24 20 39 c3 74 5d 83 fb 02 74 68 83 fb 03 75 05 45 84 [481333.680576] Lustre: MGS: Received new LWP connection from 10.9.112.12@o2ib4, removing former export from same NID [481333.691256] Lustre: Skipped 33 previous similar messages [481333.742723] LustreError: 56382:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.103.35@o2ib4 arrived at 1549998594 with bad export cookie 1894743047037864052 [481333.758364] LustreError: 56382:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 2 previous similar messages [481334.789340] LustreError: 54253:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff896bff27d050 x1624712091996144/t0(0) o4->bdf3d0f5-851d-26cd-ba90-9355de313856@10.8.6.19@o2ib6:305/0 lens 504/448 e 0 to 0 dl 1549998695 ref 1 fl Interpret:/0/0 rc 0/0 [481334.813420] Lustre: fir-MDT0002: Bulk IO write error with bdf3d0f5-851d-26cd-ba90-9355de313856 (at 10.8.6.19@o2ib6), client will retry: rc = -110 [481337.431673] LustreError: 56355:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.103.35@o2ib4 arrived at 1549998601 with bad export cookie 1894743047037864052 [481337.447313] LustreError: 56355:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 49 previous similar messages [481341.043226] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8994343d3f00/0x1a4b7ac73a678977 lrc: 3/0,0 mode: PR/PR res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 109808 type: IBT flags: 0x60000400010020 nid: 10.9.103.35@o2ib4 remote: 0x8299540b6f9fbb22 expref: 114312 pid: 57441 timeout: 481328 lvb_type: 0 [481341.081921] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 669 previous similar messages [481344.219769] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff896fe8733c00 x1624928590973760/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481344.240952] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 23 previous similar messages [481345.812661] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 8 seconds [481345.822917] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 20 previous similar messages [481347.360284] LustreError: 56261:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.103.35@o2ib4 arrived at 1549998594 with bad export cookie 1894743047037864052 [481347.370142] LustreError: 56419:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.103.35@o2ib4: deadline 45:2s ago req@ffff89819f3efb00 x1624697972806704/t0(0) o103->7030631e-b3d2-9eed-f765-9117cb5ba8a4@10.9.103.35@o2ib4:243/0 lens 328/0 e 0 to 0 dl 1549998633 ref 1 fl Interpret:H/2/ffffffff rc 0/-1 [481347.370146] LustreError: 56419:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 336 previous similar messages [481347.370155] Lustre: 56419:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (45:2s); client may timeout. req@ffff89819f3efb00 x1624697972806704/t0(0) o103->7030631e-b3d2-9eed-f765-9117cb5ba8a4@10.9.103.35@o2ib4:243/0 lens 328/0 e 0 to 0 dl 1549998633 ref 1 fl Interpret:H/2/ffffffff rc 0/-1 [481347.448036] LustreError: 56261:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 146 previous similar messages [481349.443091] LustreError: 54100:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.103.35@o2ib4 arrived at 1549998598 with bad export cookie 1894743047037864052 [481349.458727] LustreError: 54100:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 107 previous similar messages [481353.442420] LustreError: 56407:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.103.35@o2ib4 arrived at 1549998599 with bad export cookie 1894743047037864052 [481353.458199] LustreError: 56407:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 958 previous similar messages [481353.497349] LustreError: 57742:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8997ebf74800 ns: mdt-fir-MDT0000_UUID lock: ffff898700636300/0x1a4b7ac741a555c4 lrc: 3/0,0 mode: PR/PR res: [0x20000189e:0xa4f:0x0].0x0 bits 0x13/0x0 rrc: 16 type: IBT flags: 0x50200000000000 nid: 10.9.101.24@o2ib4 remote: 0x8ce7b5d9a0c5a09 expref: 3 pid: 57742 timeout: 0 lvb_type: 0 [481357.353325] NMI watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [ldlm_bl_03:55962] [481357.361238] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481357.434239] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481357.467920] CPU: 28 PID: 55962 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481357.480337] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481357.487992] task: ffff8997eeacd140 ti: ffff8997fb308000 task.ti: ffff8997fb308000 [481357.495557] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481357.505968] RSP: 0018:ffff8997fb30bbb0 EFLAGS: 00000282 [481357.511366] RAX: 0000000000000101 RBX: ffff89752aeabd20 RCX: ffff89752aeabd20 [481357.518585] RDX: ffff8997fb30bca8 RSI: ffff8995f523f740 RDI: ffff8996f7376300 [481357.525806] RBP: ffff8997fb30bc08 R08: ffff8997fb30bca8 R09: 00000000c0005f35 [481357.533025] R10: 0000000000000035 R11: ffff8995f523f740 R12: ffff8997fb30bca8 [481357.540244] R13: 00000000c0005f35 R14: 0000000000000035 R15: ffff8995f523f740 [481357.547465] FS: 00007fb564ea5700(0000) GS:ffff8967fefc0000(0000) knlGS:0000000000000000 [481357.555636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481357.561470] CR2: 00007fc038f6a000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481357.568689] Call Trace: [481357.571267] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481357.578834] [] ? _raw_write_lock+0x10/0x20 [481357.584694] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481357.592550] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481357.599622] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481357.606695] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481357.615068] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481357.623182] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481357.630140] [] ? wake_up_state+0x20/0x20 [481357.635828] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481357.643313] [] kthread+0xd1/0xe0 [481357.648278] [] ? insert_kthread_work+0x40/0x40 [481357.654459] [] ret_from_fork_nospec_begin+0xe/0x21 [481357.660982] [] ? insert_kthread_work+0x40/0x40 [481357.667160] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481359.539396] LustreError: 55169:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff89652cea0450 x1624745413640320/t0(0) o4->400e2c70-3670-eb05-66c0-e754ea5cd280@10.8.29.7@o2ib6:278/0 lens 488/448 e 1 to 0 dl 1549998668 ref 1 fl Interpret:/0/0 rc 0/0 [481359.563480] Lustre: fir-MDT0000: Bulk IO write error with 400e2c70-3670-eb05-66c0-e754ea5cd280 (at 10.8.29.7@o2ib6), client will retry: rc = -110 [481361.172416] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [migration/7:47] [481361.180072] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481361.253072] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481361.286755] CPU: 7 PID: 47 Comm: migration/7 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481361.298913] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481361.306567] task: ffff8988a9fc5140 ti: ffff8988a9ff8000 task.ti: ffff8988a9ff8000 [481361.314133] RIP: 0010:[] [] multi_cpu_stop+0x4a/0x110 [481361.322505] RSP: 0000:ffff8988a9ffbd98 EFLAGS: 00000246 [481361.327904] RAX: 0000000000000001 RBX: ffff8988a9619800 RCX: dead000000000200 [481361.335122] RDX: ffff89983f455ff0 RSI: 0000000000000282 RDI: ffff8987d03cbab0 [481361.342342] RBP: ffff8988a9ffbdc0 R08: ffff8987d03cba50 R09: 0000000000000001 [481361.349563] R10: 000000000000babf R11: 0000000000000001 R12: ffff8988a9ffbd20 [481361.356781] R13: ffff8967d9775140 R14: ffffffff9d6d2eb2 R15: ffff8988a9ffbd00 [481361.364002] FS: 00007fa72ea77700(0000) GS:ffff89983f440000(0000) knlGS:0000000000000000 [481361.372173] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481361.378005] CR2: 00007fa720095060 CR3: 000000300da98000 CR4: 00000000003407e0 [481361.385227] Call Trace: [481361.387767] [] ? cpu_stop_should_run+0x50/0x50 [481361.393945] [] cpu_stopper_thread+0x99/0x150 [481361.399952] [] ? __schedule+0x3ff/0x890 [481361.405527] [] smpboot_thread_fn+0x144/0x1a0 [481361.411531] [] ? lg_double_unlock+0x40/0x40 [481361.417449] [] kthread+0xd1/0xe0 [481361.422417] [] ? insert_kthread_work+0x40/0x40 [481361.428596] [] ret_from_fork_nospec_begin+0xe/0x21 [481361.435121] [] ? insert_kthread_work+0x40/0x40 [481361.441300] Code: 66 90 66 90 49 89 c5 48 8b 47 18 48 85 c0 0f 84 b3 00 00 00 0f a3 18 19 db 85 db 41 0f 95 c6 45 31 ff 31 c0 0f 1f 44 00 00 f3 90 <41> 8b 5c 24 20 39 c3 74 5d 83 fb 02 74 68 83 fb 03 75 05 45 84 [481365.760978] Lustre: MGS: Received new LWP connection from 10.9.107.48@o2ib4, removing former export from same NID [481365.771319] Lustre: Skipped 196 previous similar messages [481366.860561] INFO: rcu_sched self-detected stall on CPU [481366.862560] INFO: rcu_sched detected stalls on CPUs/tasks: [481366.862561] { [481366.862562] 28 [481366.862565] } [481366.862576] (detected by 15, t=60002 jiffies, g=115591742, c=115591741, q=419405) [481366.862577] Task dump for CPU 28: [481366.862579] ldlm_bl_03 R [481366.862579] running task [481366.862580] 0 55962 2 0x00000088 [481366.862581] Call Trace: [481366.862625] [] ? ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481366.862656] [] ? ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481366.862659] [] ? wake_up_state+0x20/0x20 [481366.862688] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481366.862690] [] ? kthread+0xd1/0xe0 [481366.862691] [] ? insert_kthread_work+0x40/0x40 [481366.862695] [] ? ret_from_fork_nospec_begin+0xe/0x21 [481366.862696] [] ? insert_kthread_work+0x40/0x40 [481366.943736] { [481366.945521] 28} (t=60085 jiffies g=115591742 c=115591741 q=419703) [481366.950634] Task dump for CPU 28: [481366.954040] ldlm_bl_03 R running task 0 55962 2 0x00000088 [481366.961225] Call Trace: [481366.963766] [] sched_show_task+0xa8/0x110 [481366.970154] [] dump_cpu_task+0x39/0x70 [481366.975643] [] rcu_dump_cpu_stacks+0x90/0xd0 [481366.981645] [] rcu_check_callbacks+0x442/0x730 [481366.987826] [] ? tick_sched_do_timer+0x50/0x50 [481366.994005] [] update_process_times+0x46/0x80 [481367.000096] [] tick_sched_handle+0x30/0x70 [481367.005931] [] tick_sched_timer+0x39/0x80 [481367.011677] [] __hrtimer_run_queues+0xf3/0x270 [481367.017856] [] hrtimer_interrupt+0xaf/0x1d0 [481367.023777] [] local_apic_timer_interrupt+0x3b/0x60 [481367.030389] [] smp_apic_timer_interrupt+0x43/0x60 [481367.036828] [] apic_timer_interrupt+0x162/0x170 [481367.043092] [] ? ldlm_inodebits_compat_queue+0x19f/0x440 [ptlrpc] [481367.051635] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481367.059235] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481367.067085] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481367.074156] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481367.081229] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481367.089601] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481367.097715] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481367.104671] [] ? wake_up_state+0x20/0x20 [481367.110361] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481367.117836] [] kthread+0xd1/0xe0 [481367.122804] [] ? insert_kthread_work+0x40/0x40 [481367.128983] [] ret_from_fork_nospec_begin+0xe/0x21 [481367.135509] [] ? insert_kthread_work+0x40/0x40 [481369.232623] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [ldlm_bl_01:54053] [481369.240536] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481369.313535] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481369.347219] CPU: 14 PID: 54053 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481369.359638] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481369.367290] task: ffff8967f34730c0 ti: ffff89979e6c8000 task.ti: ffff89979e6c8000 [481369.374857] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481369.385264] RSP: 0018:ffff89979e6cbbb0 EFLAGS: 00000282 [481369.390663] RAX: 0000000000000101 RBX: ffff89970bfd31e0 RCX: ffff89970bfd31e0 [481369.397884] RDX: ffff89979e6cbca8 RSI: ffff8970f4a33840 RDI: ffff896c7f632880 [481369.405103] RBP: ffff89979e6cbc08 R08: ffff89979e6cbca8 R09: 00000000c0009415 [481369.412323] R10: 0000000000000015 R11: ffff8970f4a33840 R12: ffff89979e6cbca8 [481369.419542] R13: 00000000c0009415 R14: 0000000000000015 R15: ffff8970f4a33840 [481369.426761] FS: 00007f90f5793740(0000) GS:ffff8987ff6c0000(0000) knlGS:0000000000000000 [481369.434935] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481369.440766] CR2: 00007f90f39a5620 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481369.447986] Call Trace: [481369.450566] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481369.458170] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481369.466027] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481369.473098] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481369.480168] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481369.488542] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481369.496662] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481369.503622] [] ? wake_up_state+0x20/0x20 [481369.509311] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481369.516795] [] kthread+0xd1/0xe0 [481369.521760] [] ? insert_kthread_work+0x40/0x40 [481369.527944] [] ret_from_fork_nospec_begin+0xe/0x21 [481369.534475] [] ? insert_kthread_work+0x40/0x40 [481369.540652] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481374.918783] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.9.102.25@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8973657e69c0/0x1a4b7ac7396f8930 lrc: 3/0,0 mode: PR/PR res: [0x2c0002b3f:0x68d6:0x0].0x0 bits 0x40/0x0 rrc: 79634 type: IBT flags: 0x60000400000020 nid: 10.9.102.25@o2ib4 remote: 0xee0dd17e9c8b3c00 expref: 79644 pid: 54474 timeout: 481362 lvb_type: 0 [481374.957311] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 11244 previous similar messages [481376.229396] LustreError: 54692:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff89749aee7200 x1624928591123664/t0(0) o104->fir-MDT0002@10.9.102.26@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481376.250598] LustreError: 54692:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 7663 previous similar messages [481378.415073] LustreError: 55340:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff89843dac6050 x1624759240517808/t0(0) o4->4f6b42bd-8393-db19-0238-71ebc8ff53fb@10.8.29.1@o2ib6:297/0 lens 488/448 e 0 to 0 dl 1549998687 ref 1 fl Interpret:/0/0 rc 0/0 [481378.439917] Lustre: fir-MDT0000: Bulk IO write error with 4f6b42bd-8393-db19-0238-71ebc8ff53fb (at 10.8.29.1@o2ib6), client will retry: rc = -110 [481381.275923] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [ldlm_bl_04:56043] [481381.283837] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481381.356838] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481381.390520] CPU: 19 PID: 56043 Comm: ldlm_bl_04 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481381.402937] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481381.410590] task: ffff8997eeaca080 ti: ffff8997f7c48000 task.ti: ffff8997f7c48000 [481381.418155] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481381.428564] RSP: 0018:ffff8997f7c4bbb0 EFLAGS: 00000282 [481381.433965] RAX: 0000000000000101 RBX: ffff896dcaecbf60 RCX: ffff896dcaecbf60 [481381.441183] RDX: ffff8997f7c4bca8 RSI: ffff896ebe26e540 RDI: ffff8984fa6a69c0 [481381.448402] RBP: ffff8997f7c4bc08 R08: ffff8997f7c4bca8 R09: 00000000c0009749 [481381.455623] R10: 0000000000000049 R11: ffff896ebe26e540 R12: ffff8997f7c4bca8 [481381.462843] R13: 00000000c0009749 R14: 0000000000000049 R15: ffff896ebe26e540 [481381.470062] FS: 00007f9d12b30900(0000) GS:ffff89983f500000(0000) knlGS:0000000000000000 [481381.478235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481381.484069] CR2: 00007fc038f73090 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481381.491288] Call Trace: [481381.493868] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481381.501470] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481381.509328] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481381.516409] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481381.523489] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481381.531861] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481381.539975] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481381.546940] [] ? wake_up_state+0x20/0x20 [481381.552628] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481381.560112] [] kthread+0xd1/0xe0 [481381.565076] [] ? insert_kthread_work+0x40/0x40 [481381.571259] [] ret_from_fork_nospec_begin+0xe/0x21 [481381.577785] [] ? insert_kthread_work+0x40/0x40 [481381.583963] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481389.173119] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [migration/7:47] [481389.180774] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481389.253773] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481389.287456] CPU: 7 PID: 47 Comm: migration/7 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481389.299613] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481389.307267] task: ffff8988a9fc5140 ti: ffff8988a9ff8000 task.ti: ffff8988a9ff8000 [481389.314832] RIP: 0010:[] [] multi_cpu_stop+0x4a/0x110 [481389.323205] RSP: 0000:ffff8988a9ffbd98 EFLAGS: 00000246 [481389.328604] RAX: 0000000000000001 RBX: ffff8988a9619800 RCX: dead000000000200 [481389.335823] RDX: ffff89983f455ff0 RSI: 0000000000000282 RDI: ffff8987d03cbab0 [481389.343043] RBP: ffff8988a9ffbdc0 R08: ffff8987d03cba50 R09: 0000000000000001 [481389.350263] R10: 000000000000babf R11: 0000000000000001 R12: ffff8988a9ffbd20 [481389.357483] R13: ffff8967d9775140 R14: ffffffff9d6d2eb2 R15: ffff8988a9ffbd00 [481389.364703] FS: 00007fa72ea77700(0000) GS:ffff89983f440000(0000) knlGS:0000000000000000 [481389.372874] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481389.378707] CR2: 00007fa720095060 CR3: 000000300da98000 CR4: 00000000003407e0 [481389.385928] Call Trace: [481389.388468] [] ? cpu_stop_should_run+0x50/0x50 [481389.394646] [] cpu_stopper_thread+0x99/0x150 [481389.400654] [] ? __schedule+0x3ff/0x890 [481389.406226] [] smpboot_thread_fn+0x144/0x1a0 [481389.412231] [] ? lg_double_unlock+0x40/0x40 [481389.418152] [] kthread+0xd1/0xe0 [481389.423117] [] ? insert_kthread_work+0x40/0x40 [481389.429297] [] ret_from_fork_nospec_begin+0xe/0x21 [481389.435821] [] ? insert_kthread_work+0x40/0x40 [481389.442002] Code: 66 90 66 90 49 89 c5 48 8b 47 18 48 85 c0 0f 84 b3 00 00 00 0f a3 18 19 db 85 db 41 0f 95 c6 45 31 ff 31 c0 0f 1f 44 00 00 f3 90 <41> 8b 5c 24 20 39 c3 74 5d 83 fb 02 74 68 83 fb 03 75 05 45 84 [481391.190179] LNet: Service thread pid 54692 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [481391.207200] Pid: 54692, comm: mdt01_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [481391.217027] Call Trace: [481391.219575] [] 0xffffffffffffffff [481391.224696] LustreError: dumping log to /tmp/lustre-log.1549998679.54692 [481393.354228] NMI watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [ldlm_bl_03:55962] [481393.362141] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481393.435169] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481393.468851] CPU: 28 PID: 55962 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481393.481269] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481393.488923] task: ffff8997eeacd140 ti: ffff8997fb308000 task.ti: ffff8997fb308000 [481393.496489] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481393.506896] RSP: 0018:ffff8997fb30bbb0 EFLAGS: 00000282 [481393.512295] RAX: 0000000000000101 RBX: ffff89752aeabd20 RCX: ffff89752aeabd20 [481393.519515] RDX: ffff8997fb30bca8 RSI: ffff8995f523f740 RDI: ffff89868afe6540 [481393.526734] RBP: ffff8997fb30bc08 R08: ffff8997fb30bca8 R09: 00000000c0005f35 [481393.533954] R10: 0000000000000035 R11: ffff8995f523f740 R12: ffff8997fb30bca8 [481393.541174] R13: 00000000c0005f35 R14: 0000000000000035 R15: ffff8995f523f740 [481393.548394] FS: 00007fb564ea5700(0000) GS:ffff8967fefc0000(0000) knlGS:0000000000000000 [481393.556567] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481393.562399] CR2: 00007fc038f6a000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481393.569620] Call Trace: [481393.572199] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481393.579799] [] ? ldlm_export_lock_object+0x10/0x10 [ptlrpc] [481393.587141] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481393.594999] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481393.602072] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481393.609141] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481393.617516] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481393.625628] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481393.632587] [] ? wake_up_state+0x20/0x20 [481393.638275] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481393.645758] [] kthread+0xd1/0xe0 [481393.650724] [] ? insert_kthread_work+0x40/0x40 [481393.656907] [] ret_from_fork_nospec_begin+0xe/0x21 [481393.663430] [] ? insert_kthread_work+0x40/0x40 [481393.669607] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481397.233324] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [ldlm_bl_01:54053] [481397.241236] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481397.314236] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481397.347920] CPU: 14 PID: 54053 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481397.360337] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481397.367989] task: ffff8967f34730c0 ti: ffff89979e6c8000 task.ti: ffff89979e6c8000 [481397.375556] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481397.385964] RSP: 0018:ffff89979e6cbbb0 EFLAGS: 00000282 [481397.391364] RAX: 0000000000000101 RBX: ffff89970bfd31e0 RCX: ffff89970bfd31e0 [481397.398583] RDX: ffff89979e6cbca8 RSI: ffff8970f4a33840 RDI: ffff8993fb2bad00 [481397.405803] RBP: ffff89979e6cbc08 R08: ffff89979e6cbca8 R09: 00000000c0009415 [481397.413023] R10: 0000000000000015 R11: ffff8970f4a33840 R12: ffff89979e6cbca8 [481397.420242] R13: 00000000c0009415 R14: 0000000000000015 R15: ffff8970f4a33840 [481397.427461] FS: 00007f90f5793740(0000) GS:ffff8987ff6c0000(0000) knlGS:0000000000000000 [481397.435634] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481397.441466] CR2: 00007f90f39a5620 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481397.448688] Call Trace: [481397.451265] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481397.458857] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481397.466706] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481397.473779] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481397.480850] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481397.489225] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481397.497339] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481397.504307] [] ? wake_up_state+0x20/0x20 [481397.509992] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481397.517468] [] kthread+0xd1/0xe0 [481397.522435] [] ? insert_kthread_work+0x40/0x40 [481397.528614] [] ret_from_fork_nospec_begin+0xe/0x21 [481397.535140] [] ? insert_kthread_work+0x40/0x40 [481397.541318] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481397.926309] LustreError: 54698:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8997eb61c400 ns: mdt-fir-MDT0000_UUID lock: ffff8957d5045340/0x1a4b7ac741a69d3c lrc: 3/0,0 mode: EX/EX res: [0x200003701:0x4cc0:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x50000000000000 nid: 10.9.107.63@o2ib4 remote: 0x950da5ac575b0bbb expref: 2826 pid: 54698 timeout: 0 lvb_type: 3 [481409.276626] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [ldlm_bl_04:56043] [481409.284543] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481409.357545] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481409.391226] CPU: 19 PID: 56043 Comm: ldlm_bl_04 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481409.403645] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481409.411298] task: ffff8997eeaca080 ti: ffff8997f7c48000 task.ti: ffff8997f7c48000 [481409.418865] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481409.429272] RSP: 0018:ffff8997f7c4bbb0 EFLAGS: 00000282 [481409.434672] RAX: 0000000000000101 RBX: ffff896dcaecbf60 RCX: ffff896dcaecbf60 [481409.441892] RDX: ffff8997f7c4bca8 RSI: ffff896ebe26e540 RDI: ffff8995e03efbc0 [481409.449111] RBP: ffff8997f7c4bc08 R08: ffff8997f7c4bca8 R09: 00000000c0009749 [481409.456331] R10: 0000000000000049 R11: ffff896ebe26e540 R12: ffff8997f7c4bca8 [481409.463550] R13: 00000000c0009749 R14: 0000000000000049 R15: ffff896ebe26e540 [481409.470772] FS: 00007f9d12b30900(0000) GS:ffff89983f500000(0000) knlGS:0000000000000000 [481409.478943] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481409.484774] CR2: 00007fc038f73090 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481409.491996] Call Trace: [481409.494575] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481409.502179] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481409.510036] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481409.517115] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481409.524188] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481409.532561] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481409.540676] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481409.547639] [] ? wake_up_state+0x20/0x20 [481409.553330] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481409.560811] [] kthread+0xd1/0xe0 [481409.565777] [] ? insert_kthread_work+0x40/0x40 [481409.571958] [] ret_from_fork_nospec_begin+0xe/0x21 [481409.578483] [] ? insert_kthread_work+0x40/0x40 [481409.584661] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481410.607907] Lustre: fir-MDT0000: haven't heard from client d389dc2f-a4ea-0582-21f5-cd6c5e60379e (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89711177d000, cur 1549998699 expire 1549998549 last 1549998472 [481410.629693] Lustre: Skipped 2 previous similar messages [481417.173822] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [migration/7:47] [481417.181471] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481417.254472] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481417.288155] CPU: 7 PID: 47 Comm: migration/7 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481417.300312] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481417.307965] task: ffff8988a9fc5140 ti: ffff8988a9ff8000 task.ti: ffff8988a9ff8000 [481417.315532] RIP: 0010:[] [] multi_cpu_stop+0x4a/0x110 [481417.323905] RSP: 0000:ffff8988a9ffbd98 EFLAGS: 00000246 [481417.329302] RAX: 0000000000000001 RBX: ffff8988a9619800 RCX: dead000000000200 [481417.336521] RDX: ffff89983f455ff0 RSI: 0000000000000282 RDI: ffff8987d03cbab0 [481417.343743] RBP: ffff8988a9ffbdc0 R08: ffff8987d03cba50 R09: 0000000000000001 [481417.350961] R10: 000000000000babf R11: 0000000000000001 R12: ffff8988a9ffbd20 [481417.358180] R13: ffff8967d9775140 R14: ffffffff9d6d2eb2 R15: ffff8988a9ffbd00 [481417.365402] FS: 00007fa72ea77700(0000) GS:ffff89983f440000(0000) knlGS:0000000000000000 [481417.373575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481417.379406] CR2: 00007fa720095060 CR3: 000000300da98000 CR4: 00000000003407e0 [481417.386626] Call Trace: [481417.389167] [] ? cpu_stop_should_run+0x50/0x50 [481417.395345] [] cpu_stopper_thread+0x99/0x150 [481417.401352] [] ? __schedule+0x3ff/0x890 [481417.406926] [] smpboot_thread_fn+0x144/0x1a0 [481417.412929] [] ? lg_double_unlock+0x40/0x40 [481417.418849] [] kthread+0xd1/0xe0 [481417.423815] [] ? insert_kthread_work+0x40/0x40 [481417.429996] [] ret_from_fork_nospec_begin+0xe/0x21 [481417.436522] [] ? insert_kthread_work+0x40/0x40 [481417.442700] Code: 66 90 66 90 49 89 c5 48 8b 47 18 48 85 c0 0f 84 b3 00 00 00 0f a3 18 19 db 85 db 41 0f 95 c6 45 31 ff 31 c0 0f 1f 44 00 00 f3 90 <41> 8b 5c 24 20 39 c3 74 5d 83 fb 02 74 68 83 fb 03 75 05 45 84 [481421.354929] NMI watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [ldlm_bl_03:55962] [481421.362841] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481421.435841] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481421.469523] CPU: 28 PID: 55962 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481421.481940] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481421.489593] task: ffff8997eeacd140 ti: ffff8997fb308000 task.ti: ffff8997fb308000 [481421.497159] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481421.507568] RSP: 0018:ffff8997fb30bbb0 EFLAGS: 00000282 [481421.512967] RAX: 0000000000000101 RBX: ffff89752aeabd20 RCX: ffff89752aeabd20 [481421.520189] RDX: ffff8997fb30bca8 RSI: ffff8995f523f740 RDI: ffff8976ddf421c0 [481421.527407] RBP: ffff8997fb30bc08 R08: ffff8997fb30bca8 R09: 00000000c0005f35 [481421.534626] R10: 0000000000000035 R11: ffff8995f523f740 R12: ffff8997fb30bca8 [481421.541863] R13: 00000000c0005f35 R14: 0000000000000035 R15: ffff8995f523f740 [481421.549084] FS: 00007fb564ea5700(0000) GS:ffff8967fefc0000(0000) knlGS:0000000000000000 [481421.557258] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481421.563089] CR2: 00007fc038f6a000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481421.570310] Call Trace: [481421.572888] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481421.580490] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481421.588347] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481421.595421] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481421.602492] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481421.610864] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481421.618978] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481421.625945] [] ? wake_up_state+0x20/0x20 [481421.631634] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481421.639117] [] kthread+0xd1/0xe0 [481421.644081] [] ? insert_kthread_work+0x40/0x40 [481421.650264] [] ret_from_fork_nospec_begin+0xe/0x21 [481421.656786] [] ? insert_kthread_work+0x40/0x40 [481421.662964] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481422.934979] LNet: Service thread pid 63820 was inactive for 286.76s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [481422.951999] Pid: 63820, comm: mdt03_033 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [481422.961823] Call Trace: [481422.964363] [] 0xffffffffffffffff [481422.969467] LustreError: dumping log to /tmp/lustre-log.1549998711.63820 [481425.234028] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [ldlm_bl_01:54053] [481425.241945] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481425.314944] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481425.348627] CPU: 14 PID: 54053 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481425.361043] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481425.368698] task: ffff8967f34730c0 ti: ffff89979e6c8000 task.ti: ffff89979e6c8000 [481425.376264] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481425.386672] RSP: 0018:ffff89979e6cbbb0 EFLAGS: 00000282 [481425.392073] RAX: 0000000000000101 RBX: ffff89970bfd31e0 RCX: ffff89970bfd31e0 [481425.399290] RDX: ffff89979e6cbca8 RSI: ffff8970f4a33840 RDI: ffff897cb432a1c0 [481425.406511] RBP: ffff89979e6cbc08 R08: ffff89979e6cbca8 R09: 00000000c0009415 [481425.413730] R10: 0000000000000015 R11: ffff8970f4a33840 R12: ffff89979e6cbca8 [481425.420949] R13: 00000000c0009415 R14: 0000000000000015 R15: ffff8970f4a33840 [481425.428171] FS: 00007f90f5793740(0000) GS:ffff8987ff6c0000(0000) knlGS:0000000000000000 [481425.436342] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481425.442175] CR2: 00007f90f39a5620 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481425.449393] Call Trace: [481425.451983] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481425.459584] [] ? ldlm_export_lock_object+0x10/0x10 [ptlrpc] [481425.466922] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481425.474774] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481425.481845] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481425.488918] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481425.497289] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481425.505404] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481425.512362] [] ? wake_up_state+0x20/0x20 [481425.518052] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481425.525535] [] kthread+0xd1/0xe0 [481425.530499] [] ? insert_kthread_work+0x40/0x40 [481425.536682] [] ret_from_fork_nospec_begin+0xe/0x21 [481425.543206] [] ? insert_kthread_work+0x40/0x40 [481425.549384] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481437.277328] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [ldlm_bl_04:56043] [481437.285244] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481437.358244] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481437.391925] CPU: 19 PID: 56043 Comm: ldlm_bl_04 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481437.404344] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481437.411997] task: ffff8997eeaca080 ti: ffff8997f7c48000 task.ti: ffff8997f7c48000 [481437.419563] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481437.429971] RSP: 0018:ffff8997f7c4bbb0 EFLAGS: 00000282 [481437.435372] RAX: 0000000000000101 RBX: ffff896dcaecbf60 RCX: ffff896dcaecbf60 [481437.442591] RDX: ffff8997f7c4bca8 RSI: ffff896ebe26e540 RDI: ffff8958048733c0 [481437.449811] RBP: ffff8997f7c4bc08 R08: ffff8997f7c4bca8 R09: 00000000c0009749 [481437.457029] R10: 0000000000000049 R11: ffff896ebe26e540 R12: ffff8997f7c4bca8 [481437.464250] R13: 00000000c0009749 R14: 0000000000000049 R15: ffff896ebe26e540 [481437.471469] FS: 00007f9d12b30900(0000) GS:ffff89983f500000(0000) knlGS:0000000000000000 [481437.479640] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481437.485474] CR2: 00007fc038f73090 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481437.492695] Call Trace: [481437.495269] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481437.502867] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481437.510726] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481437.517803] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481437.524886] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481437.533267] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481437.541382] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481437.548346] [] ? wake_up_state+0x20/0x20 [481437.554035] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481437.561519] [] kthread+0xd1/0xe0 [481437.566484] [] ? insert_kthread_work+0x40/0x40 [481437.572666] [] ret_from_fork_nospec_begin+0xe/0x21 [481437.579189] [] ? insert_kthread_work+0x40/0x40 [481437.585369] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481440.238335] LustreError: 56062:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff897172ae0600 x1624928591540768/t0(0) o104->fir-MDT0002@10.9.102.25@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481440.259512] LustreError: 56062:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 11749 previous similar messages [481442.395658] Lustre: DEBUG MARKER: Tue Feb 12 11:12:10 2019 [481445.174524] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [migration/7:47] [481445.182181] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481445.255181] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481445.288863] CPU: 7 PID: 47 Comm: migration/7 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481445.301021] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481445.308673] task: ffff8988a9fc5140 ti: ffff8988a9ff8000 task.ti: ffff8988a9ff8000 [481445.316240] RIP: 0010:[] [] multi_cpu_stop+0x4a/0x110 [481445.324612] RSP: 0000:ffff8988a9ffbd98 EFLAGS: 00000246 [481445.330009] RAX: 0000000000000001 RBX: ffff8988a9619800 RCX: dead000000000200 [481445.337231] RDX: ffff89983f455ff0 RSI: 0000000000000282 RDI: ffff8987d03cbab0 [481445.344450] RBP: ffff8988a9ffbdc0 R08: ffff8987d03cba50 R09: 0000000000000001 [481445.351670] R10: 000000000000babf R11: 0000000000000001 R12: ffff8988a9ffbd20 [481445.358888] R13: ffff8967d9775140 R14: ffffffff9d6d2eb2 R15: ffff8988a9ffbd00 [481445.366109] FS: 00007fa72ea77700(0000) GS:ffff89983f440000(0000) knlGS:0000000000000000 [481445.374281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481445.380113] CR2: 00007fa720095060 CR3: 000000300da98000 CR4: 00000000003407e0 [481445.387335] Call Trace: [481445.389875] [] ? cpu_stop_should_run+0x50/0x50 [481445.396053] [] cpu_stopper_thread+0x99/0x150 [481445.402060] [] ? __schedule+0x3ff/0x890 [481445.407634] [] smpboot_thread_fn+0x144/0x1a0 [481445.413638] [] ? lg_double_unlock+0x40/0x40 [481445.419558] [] kthread+0xd1/0xe0 [481445.424525] [] ? insert_kthread_work+0x40/0x40 [481445.430703] [] ret_from_fork_nospec_begin+0xe/0x21 [481445.437230] [] ? insert_kthread_work+0x40/0x40 [481445.443409] Code: 66 90 66 90 49 89 c5 48 8b 47 18 48 85 c0 0f 84 b3 00 00 00 0f a3 18 19 db 85 db 41 0f 95 c6 45 31 ff 31 c0 0f 1f 44 00 00 f3 90 <41> 8b 5c 24 20 39 c3 74 5d 83 fb 02 74 68 83 fb 03 75 05 45 84 [481448.920707] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 128s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8997227d3cc0/0x1a4b7ac73a6ac5b5 lrc: 3/0,0 mode: PR/PR res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 103878 type: IBT flags: 0x60000400010020 nid: 10.9.103.35@o2ib4 remote: 0x8299540b6fa008a0 expref: 104100 pid: 54706 timeout: 481357 lvb_type: 0 [481448.959481] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 6094 previous similar messages [481449.172655] LustreError: 54686:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff89613abbc000 ns: mdt-fir-MDT0002_UUID lock: ffff896ddef83a80/0x1a4b7ac741a6d9cd lrc: 5/0,0 mode: PW/PW res: [0x2c000430a:0x3bcd:0x0].0x0 bits 0x40/0x0 rrc: 3 type: IBT flags: 0x50200000000000 nid: 10.8.7.15@o2ib6 remote: 0x5bf01d8596bafc22 expref: 58097 pid: 54686 timeout: 0 lvb_type: 0 [481449.355633] NMI watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [ldlm_bl_03:55962] [481449.363549] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481449.436548] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481449.470231] CPU: 28 PID: 55962 Comm: ldlm_bl_03 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481449.482648] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481449.490302] task: ffff8997eeacd140 ti: ffff8997fb308000 task.ti: ffff8997fb308000 [481449.497868] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481449.508277] RSP: 0018:ffff8997fb30bbb0 EFLAGS: 00000282 [481449.513676] RAX: 0000000000000101 RBX: ffff89752aeabd20 RCX: ffff89752aeabd20 [481449.520896] RDX: ffff8997fb30bca8 RSI: ffff8995f523f740 RDI: ffff8958127a3600 [481449.528116] RBP: ffff8997fb30bc08 R08: ffff8997fb30bca8 R09: 00000000c0005f35 [481449.535336] R10: 0000000000000035 R11: ffff8995f523f740 R12: ffff8997fb30bca8 [481449.542556] R13: 00000000c0005f35 R14: 0000000000000035 R15: ffff8995f523f740 [481449.549776] FS: 00007fb564ea5700(0000) GS:ffff8967fefc0000(0000) knlGS:0000000000000000 [481449.557948] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481449.563780] CR2: 00007fc038f6a000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481449.571000] Call Trace: [481449.573578] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481449.581173] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481449.589029] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481449.596101] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481449.603174] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481449.611548] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481449.619672] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481449.626637] [] ? wake_up_state+0x20/0x20 [481449.632340] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481449.639825] [] kthread+0xd1/0xe0 [481449.644790] [] ? insert_kthread_work+0x40/0x40 [481449.650973] [] ret_from_fork_nospec_begin+0xe/0x21 [481449.657496] [] ? insert_kthread_work+0x40/0x40 [481449.663675] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481453.234730] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [ldlm_bl_01:54053] [481453.242644] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [481453.315644] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481453.349327] CPU: 14 PID: 54053 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481453.361744] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481453.369398] task: ffff8967f34730c0 ti: ffff89979e6c8000 task.ti: ffff89979e6c8000 [481453.376962] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481453.387370] RSP: 0018:ffff89979e6cbbb0 EFLAGS: 00000282 [481453.392770] RAX: 0000000000000101 RBX: ffff89970bfd31e0 RCX: ffff89970bfd31e0 [481453.399989] RDX: ffff89979e6cbca8 RSI: ffff8970f4a33840 RDI: ffff8993f9240240 [481453.407210] RBP: ffff89979e6cbc08 R08: ffff89979e6cbca8 R09: 00000000c0009415 [481453.414428] R10: 0000000000000015 R11: ffff8970f4a33840 R12: ffff89979e6cbca8 [481453.421648] R13: 00000000c0009415 R14: 0000000000000015 R15: ffff8970f4a33840 [481453.428869] FS: 00007f90f5793740(0000) GS:ffff8987ff6c0000(0000) knlGS:0000000000000000 [481453.437040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481453.442875] CR2: 00007f90f39a5620 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481453.450095] Call Trace: [481453.452669] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481453.460264] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481453.468115] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481453.475188] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481453.482259] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481453.490630] [] ? ldlm_lock_put+0x33/0x690 [ptlrpc] [481453.497184] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481453.505298] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481453.512264] [] ? wake_up_state+0x20/0x20 [481453.517952] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481453.525436] [] kthread+0xd1/0xe0 [481453.530402] [] ? insert_kthread_work+0x40/0x40 [481453.536583] [] ret_from_fork_nospec_begin+0xe/0x21 [481453.543107] [] ? insert_kthread_work+0x40/0x40 [481453.549303] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481463.898114] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [481463.908199] Lustre: Skipped 622 previous similar messages [481463.913711] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [481463.924083] Lustre: Skipped 1280 previous similar messages [481481.235436] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [ldlm_bl_01:54053] [481481.243430] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 23s! [migration/15:87] [481481.243352] Modules linked in: [481481.243431] Modules linked in: [481481.243433] osp(OE) [481481.243434] mdd(OE) [481481.243435] lod(OE) [481481.243435] mdt(OE) [481481.243436] lfsck(OE) [481481.243436] mgs(OE) [481481.243437] mgc(OE) [481481.243437] osd_ldiskfs(OE) [481481.243438] lquota(OE) [481481.243438] ldiskfs(OE) [481481.243439] lustre(OE) [481481.243439] lmv(OE) [481481.243440] mdc(OE) [481481.243441] osc(OE) [481481.243441] lov(OE) [481481.243442] fid(OE) [481481.243442] fld(OE) [481481.243442] ko2iblnd(OE) [481481.243443] ptlrpc(OE) [481481.243443] obdclass(OE) [481481.243443] lnet(OE) [481481.243444] libcfs(OE) [481481.243445] rpcsec_gss_krb5 [481481.243445] auth_rpcgss [481481.243446] nfsv4 [481481.243446] dns_resolver [481481.243447] nfs [481481.243447] lockd [481481.243448] grace [481481.243449] fscache [481481.243449] rdma_ucm(OE) [481481.243450] ib_ucm(OE) [481481.243451] rdma_cm(OE) [481481.243451] iw_cm(OE) [481481.243452] ib_ipoib(OE) [481481.243452] ib_cm(OE) [481481.243453] ib_umad(OE) [481481.243453] mlx5_fpga_tools(OE) [481481.243454] mlx4_en(OE) [481481.243455] mlx4_ib(OE) [481481.243455] mlx4_core(OE) [481481.243456] dell_rbu [481481.243456] sunrpc [481481.243457] vfat [481481.243457] fat [481481.243458] dm_round_robin [481481.243458] dcdbas [481481.243459] amd64_edac_mod [481481.243459] edac_mce_amd [481481.243460] kvm_amd [481481.243461] kvm [481481.243461] irqbypass [481481.243462] crc32_pclmul [481481.243462] ghash_clmulni_intel [481481.243463] aesni_intel [481481.243463] lrw [481481.243463] gf128mul [481481.243464] glue_helper [481481.243465] ablk_helper [481481.243465] cryptd [481481.243466] ses [481481.243466] dm_multipath [481481.243467] ipmi_si [481481.243467] enclosure [481481.243468] pcspkr [481481.243468] dm_mod [481481.243469] sg [481481.243469] ipmi_devintf [481481.243470] ccp [481481.243470] i2c_piix4 [481481.243471] ipmi_msghandler [481481.243471] k10temp [481481.243472] acpi_power_meter [481481.243472] knem(OE) [481481.243473] ip_tables [481481.243473] ext4 [481481.243474] mbcache [481481.243474] jbd2 [481481.243474] sd_mod [481481.243475] crc_t10dif [481481.243475] crct10dif_generic [481481.243476] mlx5_ib(OE) [481481.243477] ib_uverbs(OE) [481481.243477] ib_core(OE) [481481.243478] i2c_algo_bit [481481.243478] drm_kms_helper [481481.243479] ahci [481481.243480] syscopyarea [481481.243480] mlx5_core(OE) [481481.243481] sysfillrect [481481.243482] sysimgblt [481481.243482] libahci [481481.243483] mlxfw(OE) [481481.243483] fb_sys_fops [481481.243484] devlink [481481.243484] ttm [481481.243484] crct10dif_pclmul [481481.243485] tg3 [481481.243485] crct10dif_common [481481.243486] mlx_compat(OE) [481481.243487] drm [481481.243487] megaraid_sas [481481.243487] crc32c_intel [481481.243488] libata [481481.243488] ptp [481481.243489] drm_panel_orientation_quirks [481481.243490] pps_core [481481.243490] mpt3sas(OE) [481481.243491] raid_class [481481.243491] scsi_transport_sas [481481.243492] [last unloaded: libcfs] [481481.243492] [481481.243495] CPU: 15 PID: 87 Comm: migration/15 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481481.243495] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481481.243496] task: ffff8988a9468000 ti: ffff8988a9470000 task.ti: ffff8988a9470000 [481481.243497] RIP: 0010:[] [481481.243502] [] multi_cpu_stop+0x4a/0x110 [481481.243503] RSP: 0000:ffff8988a9473d98 EFLAGS: 00000246 [481481.243504] RAX: 0000000000000001 RBX: ffff8967fefdab80 RCX: dead000000000200 [481481.243504] RDX: ffff89983f4d5ff0 RSI: 0000000000000282 RDI: ffff8987d03cbab0 [481481.243505] RBP: ffff8988a9473dc0 R08: ffff8987d03cba80 R09: 0000000000000001 [481481.243506] R10: 000000000000bdff R11: 0000000000000001 R12: 0000000000000000 [481481.243506] R13: 000000000001ab80 R14: ffff8988a962f000 R15: 000000020000000f [481481.243508] FS: 00007fa72ea77700(0000) GS:ffff89983f4c0000(0000) knlGS:0000000000000000 [481481.243509] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481481.243509] CR2: 00007fa720014300 CR3: 000000300da98000 CR4: 00000000003407e0 [481481.243510] Call Trace: [481481.243513] [] ? cpu_stop_should_run+0x50/0x50 [481481.243514] [] cpu_stopper_thread+0x99/0x150 [481481.243518] [] ? __schedule+0x3ff/0x890 [481481.243520] [] smpboot_thread_fn+0x144/0x1a0 [481481.243522] [] ? lg_double_unlock+0x40/0x40 [481481.243524] [] kthread+0xd1/0xe0 [481481.243525] [] ? insert_kthread_work+0x40/0x40 [481481.243527] [] ret_from_fork_nospec_begin+0xe/0x21 [481481.243529] [] ? insert_kthread_work+0x40/0x40 [481481.243529] Code: [481481.243530] 66 [481481.243530] 90 [481481.243531] 66 [481481.243531] 90 [481481.243531] 49 [481481.243532] 89 [481481.243532] c5 [481481.243532] 48 [481481.243532] 8b [481481.243533] 47 [481481.243533] 18 [481481.243533] 48 [481481.243534] 85 [481481.243534] c0 [481481.243534] 0f [481481.243535] 84 [481481.243535] b3 [481481.243535] 00 [481481.243535] 00 [481481.243536] 00 [481481.243536] 0f [481481.243536] a3 [481481.243537] 18 [481481.243537] 19 [481481.243537] db [481481.243537] 85 [481481.243538] db [481481.243538] 41 [481481.243538] 0f [481481.243539] 95 [481481.243539] c6 [481481.243539] 45 [481481.243539] 31 [481481.243540] ff [481481.243540] 31 [481481.243540] c0 [481481.243541] 0f [481481.243541] 1f [481481.243541] 44 [481481.243541] 00 [481481.243542] 00 [481481.243542] f3 [481481.243542] 90 [481481.243543] <41> [481481.243543] 8b [481481.243543] 5c [481481.243544] 24 [481481.243544] 20 [481481.243544] 39 [481481.243544] c3 [481481.243545] 74 [481481.243545] 5d [481481.243545] 83 [481481.243546] fb [481481.243546] 02 [481481.243546] 74 [481481.243546] 68 [481481.243547] 83 [481481.243547] fb [481481.243547] 03 [481481.243548] 75 [481481.243548] 05 [481481.243548] 45 [481481.243549] 84 [481481.243549] [481481.536182] osp(OE) [481481.538485] mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic [481481.609380] mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481481.641476] CPU: 14 PID: 54053 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481481.653893] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481481.661548] task: ffff8967f34730c0 ti: ffff89979e6c8000 task.ti: ffff89979e6c8000 [481481.669114] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481481.679521] RSP: 0018:ffff89979e6cbbb0 EFLAGS: 00000282 [481481.684920] RAX: 0000000000000101 RBX: ffff89970bfd31e0 RCX: ffff89970bfd31e0 [481481.692141] RDX: ffff89979e6cbca8 RSI: ffff8970f4a33840 RDI: ffff8957d28e9440 [481481.699358] RBP: ffff89979e6cbc08 R08: ffff89979e6cbca8 R09: 00000000c0009415 [481481.706579] R10: 0000000000000015 R11: ffff8970f4a33840 R12: ffff89979e6cbca8 [481481.713800] R13: 00000000c0009415 R14: 0000000000000015 R15: ffff8970f4a33840 [481481.721018] FS: 00007f90f5793740(0000) GS:ffff8987ff6c0000(0000) knlGS:0000000000000000 [481481.729192] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481481.735023] CR2: 00007f90f39a5620 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481481.742243] Call Trace: [481481.744826] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481481.752431] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481481.760287] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481481.767367] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481481.774451] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481481.782829] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481481.790953] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481481.797916] [] ? wake_up_state+0x20/0x20 [481481.803616] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481481.811097] [] kthread+0xd1/0xe0 [481481.816062] [] ? insert_kthread_work+0x40/0x40 [481481.822244] [] ret_from_fork_nospec_begin+0xe/0x21 [481481.828765] [] ? insert_kthread_work+0x40/0x40 [481481.834943] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481509.236136] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [ldlm_bl_01:54053] [481509.244133] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [migration/15:87] [481509.244055] Modules linked in: [481509.244134] Modules linked in: [481509.244135] osp(OE) [481509.244136] mdd(OE) [481509.244136] lod(OE) [481509.244137] mdt(OE) [481509.244138] lfsck(OE) [481509.244138] mgs(OE) [481509.244139] mgc(OE) [481509.244139] osd_ldiskfs(OE) [481509.244140] lquota(OE) [481509.244140] ldiskfs(OE) [481509.244141] lustre(OE) [481509.244142] lmv(OE) [481509.244142] mdc(OE) [481509.244143] osc(OE) [481509.244143] lov(OE) [481509.244144] fid(OE) [481509.244144] fld(OE) [481509.244144] ko2iblnd(OE) [481509.244145] ptlrpc(OE) [481509.244145] obdclass(OE) [481509.244146] lnet(OE) [481509.244146] libcfs(OE) [481509.244147] rpcsec_gss_krb5 [481509.244147] auth_rpcgss [481509.244148] nfsv4 [481509.244149] dns_resolver [481509.244149] nfs [481509.244150] lockd [481509.244150] grace [481509.244151] fscache [481509.244151] rdma_ucm(OE) [481509.244152] ib_ucm(OE) [481509.244153] rdma_cm(OE) [481509.244153] iw_cm(OE) [481509.244154] ib_ipoib(OE) [481509.244154] ib_cm(OE) [481509.244155] ib_umad(OE) [481509.244156] mlx5_fpga_tools(OE) [481509.244156] mlx4_en(OE) [481509.244157] mlx4_ib(OE) [481509.244157] mlx4_core(OE) [481509.244158] dell_rbu [481509.244158] sunrpc [481509.244159] vfat [481509.244159] fat [481509.244160] dm_round_robin [481509.244160] dcdbas [481509.244161] amd64_edac_mod [481509.244161] edac_mce_amd [481509.244162] kvm_amd [481509.244162] kvm [481509.244163] irqbypass [481509.244163] crc32_pclmul [481509.244164] ghash_clmulni_intel [481509.244165] aesni_intel [481509.244165] lrw [481509.244165] gf128mul [481509.244166] glue_helper [481509.244166] ablk_helper [481509.244167] cryptd [481509.244168] ses [481509.244168] dm_multipath [481509.244169] ipmi_si [481509.244169] enclosure [481509.244170] pcspkr [481509.244170] dm_mod [481509.244171] sg [481509.244171] ipmi_devintf [481509.244171] ccp [481509.244172] i2c_piix4 [481509.244172] ipmi_msghandler [481509.244173] k10temp [481509.244173] acpi_power_meter [481509.244174] knem(OE) [481509.244174] ip_tables [481509.244175] ext4 [481509.244175] mbcache [481509.244176] jbd2 [481509.244176] sd_mod [481509.244176] crc_t10dif [481509.244177] crct10dif_generic [481509.244178] mlx5_ib(OE) [481509.244178] ib_uverbs(OE) [481509.244179] ib_core(OE) [481509.244179] i2c_algo_bit [481509.244180] drm_kms_helper [481509.244181] ahci [481509.244181] syscopyarea [481509.244182] mlx5_core(OE) [481509.244183] sysfillrect [481509.244183] sysimgblt [481509.244184] libahci [481509.244184] mlxfw(OE) [481509.244185] fb_sys_fops [481509.244185] devlink [481509.244185] ttm [481509.244186] crct10dif_pclmul [481509.244186] tg3 [481509.244187] crct10dif_common [481509.244187] mlx_compat(OE) [481509.244188] drm [481509.244188] megaraid_sas [481509.244189] crc32c_intel [481509.244189] libata [481509.244190] ptp [481509.244190] drm_panel_orientation_quirks [481509.244191] pps_core [481509.244191] mpt3sas(OE) [481509.244192] raid_class [481509.244192] scsi_transport_sas [481509.244193] [last unloaded: libcfs] [481509.244193] [481509.244196] CPU: 15 PID: 87 Comm: migration/15 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481509.244196] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481509.244197] task: ffff8988a9468000 ti: ffff8988a9470000 task.ti: ffff8988a9470000 [481509.244198] RIP: 0010:[] [481509.244202] [] multi_cpu_stop+0xb3/0x110 [481509.244203] RSP: 0000:ffff8988a9473d98 EFLAGS: 00000246 [481509.244204] RAX: 0000000000000001 RBX: ffff8967fefdab80 RCX: dead000000000200 [481509.244204] RDX: ffff89983f4d5ff0 RSI: 0000000000000282 RDI: ffff8987d03cbab0 [481509.244205] RBP: ffff8988a9473dc0 R08: ffff8987d03cba80 R09: 0000000000000001 [481509.244206] R10: 000000000000bdff R11: 0000000000000001 R12: 0000000000000000 [481509.244206] R13: 000000000001ab80 R14: ffff8988a962f000 R15: 000000020000000f [481509.244208] FS: 00007fa72ea77700(0000) GS:ffff89983f4c0000(0000) knlGS:0000000000000000 [481509.244208] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481509.244209] CR2: 00007fa720014300 CR3: 000000300da98000 CR4: 00000000003407e0 [481509.244210] Call Trace: [481509.244213] [] ? cpu_stop_should_run+0x50/0x50 [481509.244214] [] cpu_stopper_thread+0x99/0x150 [481509.244217] [] ? __schedule+0x3ff/0x890 [481509.244220] [] smpboot_thread_fn+0x144/0x1a0 [481509.244221] [] ? lg_double_unlock+0x40/0x40 [481509.244223] [] kthread+0xd1/0xe0 [481509.244224] [] ? insert_kthread_work+0x40/0x40 [481509.244226] [] ret_from_fork_nospec_begin+0xe/0x21 [481509.244228] [] ? insert_kthread_work+0x40/0x40 [481509.244228] Code: [481509.244229] 20 [481509.244229] 83 [481509.244230] fb [481509.244230] 04 [481509.244230] 74 [481509.244230] 0a [481509.244231] 89 [481509.244231] d8 [481509.244231] eb [481509.244232] b6 [481509.244232] 66 [481509.244232] 0f [481509.244232] 1f [481509.244233] 44 [481509.244233] 00 [481509.244233] 00 [481509.244234] 4c [481509.244234] 89 [481509.244234] ef [481509.244235] 57 [481509.244235] 9d [481509.244235] 66 [481509.244235] 66 [481509.244236] 90 [481509.244236] 66 [481509.244236] 90 [481509.244237] 5b [481509.244237] 41 [481509.244237] 5c [481509.244237] 41 [481509.244238] 5d [481509.244238] 41 [481509.244238] 5e [481509.244239] 44 [481509.244239] 89 [481509.244239] f8 [481509.244239] 41 [481509.244240] 5f [481509.244240] 5d [481509.244240] c3 [481509.244241] 83 [481509.244241] f8 [481509.244241] 01 [481509.244241] <76> [481509.244242] d9 [481509.244242] e8 [481509.244242] e6 [481509.244243] 8b [481509.244243] 01 [481509.244243] 00 [481509.244244] eb [481509.244244] cd [481509.244244] 0f [481509.244244] 1f [481509.244245] 40 [481509.244245] 00 [481509.244245] fa [481509.244246] 66 [481509.244246] 66 [481509.244246] 90 [481509.244246] 66 [481509.244247] 66 [481509.244247] 90 [481509.244247] eb [481509.244247] [481509.536865] osp(OE) [481509.539171] mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic [481509.610065] mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] [481509.642162] CPU: 14 PID: 54053 Comm: ldlm_bl_01 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [481509.654586] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [481509.662240] task: ffff8967f34730c0 ti: ffff89979e6c8000 task.ti: ffff89979e6c8000 [481509.669805] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [481509.680214] RSP: 0018:ffff89979e6cbbb0 EFLAGS: 00000282 [481509.685614] RAX: 0000000000000101 RBX: ffff89970bfd31e0 RCX: ffff89970bfd31e0 [481509.692833] RDX: ffff89979e6cbca8 RSI: ffff8970f4a33840 RDI: ffff8993fb3e1200 [481509.700053] RBP: ffff89979e6cbc08 R08: ffff89979e6cbca8 R09: 00000000c0009415 [481509.707272] R10: 0000000000000015 R11: ffff8970f4a33840 R12: ffff89979e6cbca8 [481509.714492] R13: 00000000c0009415 R14: 0000000000000015 R15: ffff8970f4a33840 [481509.721711] FS: 00007f90f5793740(0000) GS:ffff8987ff6c0000(0000) knlGS:0000000000000000 [481509.729884] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [481509.735717] CR2: 00007f90f39a5620 CR3: 0000003dc7410000 CR4: 00000000003407e0 [481509.742936] Call Trace: [481509.745520] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481509.753123] [] ? ldlm_export_lock_object+0x10/0x10 [ptlrpc] [481509.760464] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481509.768321] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481509.775403] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481509.782482] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481509.790863] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481509.798987] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481509.805950] [] ? wake_up_state+0x20/0x20 [481509.811650] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481509.819130] [] kthread+0xd1/0xe0 [481509.824095] [] ? insert_kthread_work+0x40/0x40 [481509.830277] [] ret_from_fork_nospec_begin+0xe/0x21 [481509.836799] [] ? insert_kthread_work+0x40/0x40 [481509.842979] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [481516.050304] INFO: rcu_sched self-detected stall on CPU [481516.051302] INFO: rcu_sched self-detected stall on CPU [481516.051303] { [481516.051303] 15 [481516.051304] } [481516.051314] (t=60001 jiffies g=115591743 c=115591742 q=6728165) [481516.051315] Task dump for CPU 14: [481516.051316] ldlm_bl_01 R [481516.051316] running task [481516.051318] 0 54053 2 0x00000088 [481516.051318] Call Trace: [481516.051362] [] ? ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481516.051394] [] ? ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481516.051397] [] ? wake_up_state+0x20/0x20 [481516.051426] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481516.051427] [] ? kthread+0xd1/0xe0 [481516.051429] [] ? insert_kthread_work+0x40/0x40 [481516.051431] [] ? ret_from_fork_nospec_begin+0xe/0x21 [481516.051432] [] ? insert_kthread_work+0x40/0x40 [481516.051432] Task dump for CPU 15: [481516.051433] migration/15 R [481516.051433] running task [481516.051434] 0 87 2 0x00000000 [481516.051434] Call Trace: [481516.051436] [481516.051437] [] sched_show_task+0xa8/0x110 [481516.051439] [] dump_cpu_task+0x39/0x70 [481516.051441] [] rcu_dump_cpu_stacks+0x90/0xd0 [481516.051443] [] rcu_check_callbacks+0x442/0x730 [481516.051445] [] ? tick_sched_do_timer+0x50/0x50 [481516.051447] [] update_process_times+0x46/0x80 [481516.051448] [] tick_sched_handle+0x30/0x70 [481516.051450] [] tick_sched_timer+0x39/0x80 [481516.051452] [] __hrtimer_run_queues+0xf3/0x270 [481516.051453] [] hrtimer_interrupt+0xaf/0x1d0 [481516.051456] [] local_apic_timer_interrupt+0x3b/0x60 [481516.051458] [] smp_apic_timer_interrupt+0x43/0x60 [481516.051459] [] apic_timer_interrupt+0x162/0x170 [481516.051460] [481516.051462] [] ? multi_cpu_stop+0x4a/0x110 [481516.051463] [] ? cpu_stop_should_run+0x50/0x50 [481516.051464] [] cpu_stopper_thread+0x99/0x150 [481516.051466] [] ? __schedule+0x3ff/0x890 [481516.051468] [] smpboot_thread_fn+0x144/0x1a0 [481516.051469] [] ? lg_double_unlock+0x40/0x40 [481516.051470] [] kthread+0xd1/0xe0 [481516.051471] [] ? insert_kthread_work+0x40/0x40 [481516.051473] [] ret_from_fork_nospec_begin+0xe/0x21 [481516.051474] [] ? insert_kthread_work+0x40/0x40 [481516.284115] { [481516.285900] 14} (t=60236 jiffies g=115591743 c=115591742 q=6729030) [481516.291101] Task dump for CPU 14: [481516.294505] ldlm_bl_01 R running task 0 54053 2 0x00000088 [481516.301690] Call Trace: [481516.304233] [] sched_show_task+0xa8/0x110 [481516.310619] [] dump_cpu_task+0x39/0x70 [481516.316107] [] rcu_dump_cpu_stacks+0x90/0xd0 [481516.322110] [] rcu_check_callbacks+0x442/0x730 [481516.328292] [] ? tick_sched_do_timer+0x50/0x50 [481516.334472] [] update_process_times+0x46/0x80 [481516.340563] [] tick_sched_handle+0x30/0x70 [481516.346395] [] tick_sched_timer+0x39/0x80 [481516.352144] [] __hrtimer_run_queues+0xf3/0x270 [481516.358322] [] hrtimer_interrupt+0xaf/0x1d0 [481516.364244] [] local_apic_timer_interrupt+0x3b/0x60 [481516.370856] [] smp_apic_timer_interrupt+0x43/0x60 [481516.377294] [] apic_timer_interrupt+0x162/0x170 [481516.383556] [] ? ldlm_inodebits_compat_queue+0x19f/0x440 [ptlrpc] [481516.392107] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [481516.399707] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [481516.407562] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [481516.414645] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [481516.421725] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [481516.430107] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [481516.438229] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [481516.445193] [] ? wake_up_state+0x20/0x20 [481516.450884] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [481516.458365] [] kthread+0xd1/0xe0 [481516.463329] [] ? insert_kthread_work+0x40/0x40 [481516.469510] [] ret_from_fork_nospec_begin+0xe/0x21 [481516.476035] [] ? insert_kthread_work+0x40/0x40 [481516.482212] Task dump for CPU 15: [481516.485619] migration/15 R running task 0 87 2 0x00000008 [481516.492802] Call Trace: [481516.495347] [] ? __schedule+0x3ff/0x890 [481516.500918] [] ? smpboot_thread_fn+0x144/0x1a0 [481516.507097] [] ? lg_double_unlock+0x40/0x40 [481516.513016] [] ? kthread+0xd1/0xe0 [481516.518155] [] ? insert_kthread_work+0x40/0x40 [481516.524333] [] ? ret_from_fork_nospec_begin+0xe/0x21 [481516.531035] [] ? insert_kthread_work+0x40/0x40 [481568.247546] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff89974e7ef800 x1624928593468048/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481568.268724] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 84715 previous similar messages [481589.924184] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 132s: evicting client at 10.9.102.26@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8995043333c0/0x1a4b7ac73961b84d lrc: 3/0,0 mode: PR/PR res: [0x2c0002b31:0x6c18:0x0].0x0 bits 0x40/0x0 rrc: 62179 type: IBT flags: 0x60000400000020 nid: 10.9.102.26@o2ib4 remote: 0x3c225897ac723412 expref: 62189 pid: 54217 timeout: 481577 lvb_type: 0 [481589.962793] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2933 previous similar messages [481721.182472] Lustre: 54572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff896c1b39f500 x1624703655439088/t0(0) o101->386cceaf-b887-f800-5980-0888a79dd601@10.9.105.63@o2ib4:624/0 lens 480/568 e 23 to 0 dl 1549999014 ref 2 fl Interpret:/0/0 rc 0/0 [481731.078715] Lustre: 56088:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff899755e41b00 x1624748683253744/t30576195554(0) o36->40010d6e-d29b-c686-3f5e-dc2316139f55@10.9.107.27@o2ib4:634/0 lens 496/2888 e 20 to 0 dl 1549999024 ref 2 fl Interpret:/0/0 rc 0/0 [481824.255601] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8995d33c7800 x1624928595679984/t0(0) o104->fir-MDT0002@10.9.103.35@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [481824.276781] LustreError: 63820:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 69461 previous similar messages [481836.075351] LustreError: 54692:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1549998824, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff896ebe26e540/0x1a4b7ac741a4f1da lrc: 3/0,1 mode: --/PW res: [0x2c0002b31:0x6c18:0x0].0x0 bits 0x40/0x0 rrc: 32361 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 54692 timeout: 0 lvb_type: 0 [481865.314091] Lustre: 63798:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-74), not sending early reply req@ffff896fe3a03900 x1624699740351616/t0(0) o101->4f2a8620-81f0-e31b-fbad-a029c3256423@10.9.105.43@o2ib4:13/0 lens 480/568 e 3 to 0 dl 1549999158 ref 2 fl Interpret:/0/0 rc 0/0 [481916.834377] LustreError: 56062:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1549998905, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8970f4a33840/0x1a4b7ac7416c39f7 lrc: 3/0,1 mode: --/PW res: [0x2c0002b3f:0x68d6:0x0].0x0 bits 0x40/0x0 rrc: 4467 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 56062 timeout: 0 lvb_type: 0 [481921.413860] LNet: Service thread pid 56062 completed after 795.72s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [481923.932569] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.9.103.35@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff895c242172c0/0x1a4b7ac73a41cb71 lrc: 3/0,0 mode: PR/PR res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 70151 type: IBT flags: 0x60000400010020 nid: 10.9.103.35@o2ib4 remote: 0x8299540b6f9c2a4a expref: 70353 pid: 56135 timeout: 481911 lvb_type: 0 [481923.971088] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 88329 previous similar messages [481940.954425] LNet: Service thread pid 54692 completed after 750.09s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [482064.998791] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [482065.008880] Lustre: Skipped 707 previous similar messages [482065.014402] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [482065.024749] Lustre: Skipped 713 previous similar messages [482208.719700] LustreError: 63820:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1549999197, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8995f523f740/0x1a4b7ac741771fee lrc: 3/0,1 mode: --/PW res: [0x2c00033c7:0x2b6e:0x0].0x0 bits 0x40/0x0 rrc: 39993 type: IBT flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 63820 timeout: 0 lvb_type: 0 [482361.725697] LNet: Service thread pid 63820 completed after 1225.53s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [482666.099950] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [482666.110041] Lustre: Skipped 1 previous similar message [482666.115299] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [482666.125658] Lustre: Skipped 1 previous similar message [482709.613274] Lustre: fir-MDT0002: haven't heard from client 38063fd0-8914-d8d4-6ba8-da9d20cf8181 (at 10.8.13.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898687fb6000, cur 1549999998 expire 1549999848 last 1549999771 [482709.635091] Lustre: Skipped 5 previous similar messages [482991.742352] LNet: Service thread pid 54713 was inactive for 200.48s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [482991.759374] Pid: 54713, comm: mdt01_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [482991.769204] Call Trace: [482991.771752] [] 0xffffffffffffffff [482991.776915] LustreError: dumping log to /tmp/lustre-log.1550000280.54713 [483006.022112] Lustre: DEBUG MARKER: Tue Feb 12 11:38:14 2019 [483267.200831] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [483267.210942] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [483267.221286] Lustre: Skipped 3 previous similar messages [483386.504260] Lustre: 54685:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff89722ffd9e00 x1624672896073168/t30578765447(0) o36->e9d1b5f8-7ec9-998b-fe00-a6102cb74525@10.9.102.2@o2ib4:24/0 lens 496/2888 e 24 to 0 dl 1550000679 ref 2 fl Interpret:/0/0 rc 0/0 [483426.631462] Lustre: fir-MDT0002: haven't heard from client 9e77b693-63df-849b-a228-a6c1f81932bb (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896b113eb400, cur 1550000715 expire 1550000565 last 1550000488 [483426.653281] Lustre: Skipped 2 previous similar messages [483438.025928] LNet: Service thread pid 54713 completed after 646.76s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [483864.642479] Lustre: fir-MDT0002: haven't heard from client a1af19c9-6240-5b50-a172-f2b4e5930368 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896278f3b800, cur 1550001153 expire 1550001003 last 1550000926 [483864.664268] Lustre: Skipped 2 previous similar messages [483868.296478] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [483868.306568] Lustre: Skipped 1 previous similar message [483868.311824] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [483868.322167] Lustre: Skipped 4 previous similar messages [484391.655656] Lustre: fir-MDT0002: haven't heard from client 7453866c-ba6a-7bbb-051e-5cef18c08a04 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8977013c3000, cur 1550001680 expire 1550001530 last 1550001453 [484391.677448] Lustre: Skipped 2 previous similar messages [484469.397492] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [484469.407609] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [484469.417959] Lustre: Skipped 3 previous similar messages [484477.603634] LNet: Service thread pid 63801 was inactive for 200.67s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [484477.620658] Pid: 63801, comm: mdt01_061 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [484477.630565] Call Trace: [484477.633116] [] 0xffffffffffffffff [484477.638241] LustreError: dumping log to /tmp/lustre-log.1550001765.63801 [484872.011537] Lustre: 56099:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8971b1e00f00 x1624748134779408/t30581055158(0) o36->9ac05ed6-3537-7c2f-d62f-875a75698c71@10.9.107.35@o2ib4:0/0 lens 496/2888 e 24 to 0 dl 1550002165 ref 2 fl Interpret:/0/0 rc 0/0 [484924.668893] Lustre: fir-MDT0000: haven't heard from client 36a9956b-003d-7e94-9d03-d77be3735083 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff895824987000, cur 1550002213 expire 1550002063 last 1550001986 [484924.690695] Lustre: Skipped 2 previous similar messages [485070.493480] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [485070.503572] Lustre: Skipped 1 previous similar message [485070.508834] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [485070.519177] Lustre: Skipped 7 previous similar messages [485103.699755] Lustre: DEBUG MARKER: Tue Feb 12 12:13:12 2019 [485279.133050] LNet: Service thread pid 63801 completed after 1002.18s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [485431.681597] Lustre: fir-MDT0002: haven't heard from client 576087f9-5682-a505-77bf-c479536e3b51 (at 10.8.11.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980b32a0000, cur 1550002720 expire 1550002570 last 1550002493 [485431.703387] Lustre: Skipped 2 previous similar messages [485671.594726] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [485671.604840] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [485671.615192] Lustre: Skipped 3 previous similar messages [486272.690495] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [486272.700616] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [486666.970591] LNet: Service thread pid 54713 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [486666.987616] Pid: 54713, comm: mdt01_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [486666.997441] Call Trace: [486666.999997] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [486667.007031] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [486667.014335] [] mdt_dom_discard_data+0x101/0x130 [mdt] [486667.021173] [] mdt_reint_rename_internal.isra.46+0xb78/0x2760 [mdt] [486667.029216] [] mdt_reint_rename_or_migrate.isra.51+0x19b/0x860 [mdt] [486667.037345] [] mdt_reint_rename+0x13/0x20 [mdt] [486667.043657] [] mdt_reint_rec+0x83/0x210 [mdt] [486667.049791] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [486667.056447] [] mdt_reint+0x67/0x140 [mdt] [486667.062236] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [486667.069277] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [486667.077082] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [486667.083498] [] kthread+0xd1/0xe0 [486667.088499] [] ret_from_fork_nospec_begin+0xe/0x21 [486667.095075] [] 0xffffffffffffffff [486667.100195] LustreError: dumping log to /tmp/lustre-log.1550003955.54713 [486668.288021] LNet: Service thread pid 63792 was inactive for 200.61s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [486668.305054] Pid: 63792, comm: mdt02_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [486668.314883] Call Trace: [486668.317439] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [486668.324474] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [486668.331667] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [486668.338426] [] osp_md_object_lock+0x162/0x2d0 [osp] [486668.345083] [] lod_object_lock+0xf3/0x7b0 [lod] [486668.351402] [] mdd_object_lock+0x3e/0xe0 [mdd] [486668.357625] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [486668.364994] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [486668.371821] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [486668.379951] [] mdt_reint_rename+0x13/0x20 [mdt] [486668.386274] [] mdt_reint_rec+0x83/0x210 [mdt] [486668.392414] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [486668.399069] [] mdt_reint+0x67/0x140 [mdt] [486668.404860] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [486668.411896] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [486668.419698] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [486668.426110] [] kthread+0xd1/0xe0 [486668.431110] [] ret_from_fork_nospec_begin+0xe/0x21 [486668.437672] [] 0xffffffffffffffff [486668.442820] Pid: 56072, comm: mdt02_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [486668.452644] Call Trace: [486668.455195] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [486668.462216] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [486668.469411] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [486668.476189] [] osp_md_object_lock+0x162/0x2d0 [osp] [486668.482852] [] lod_object_lock+0xf3/0x7b0 [lod] [486668.489162] [] mdd_object_lock+0x3e/0xe0 [mdd] [486668.495383] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [486668.502756] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [486668.509590] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [486668.517720] [] mdt_reint_rename+0x13/0x20 [mdt] [486668.524028] [] mdt_reint_rec+0x83/0x210 [mdt] [486668.530165] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [486668.536836] [] mdt_reint+0x67/0x140 [mdt] [486668.542628] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [486668.549667] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [486668.557481] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [486668.563896] [] kthread+0xd1/0xe0 [486668.568896] [] ret_from_fork_nospec_begin+0xe/0x21 [486668.575459] [] 0xffffffffffffffff [486668.580559] Pid: 54712, comm: mdt02_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [486668.590400] Call Trace: [486668.592948] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [486668.599969] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [486668.607248] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [486668.614164] [] mdt_object_lock_internal+0x70/0x3e0 [mdt] [486668.621252] [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] [486668.628344] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [486668.634999] [] mdt_intent_policy+0x2e8/0xd00 [mdt] [486668.641569] [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [486668.648417] [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [486668.655640] [] tgt_enqueue+0x62/0x210 [ptlrpc] [486668.661893] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [486668.668921] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [486668.676723] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [486668.683136] [] kthread+0xd1/0xe0 [486668.688136] [] ret_from_fork_nospec_begin+0xe/0x21 [486668.694717] [] 0xffffffffffffffff [486668.699832] Pid: 60307, comm: mdt01_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [486668.709661] Call Trace: [486668.712212] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [486668.719234] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [486668.726426] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [486668.733187] [] osp_md_object_lock+0x162/0x2d0 [osp] [486668.739844] [] lod_object_lock+0xf3/0x7b0 [lod] [486668.746151] [] mdd_object_lock+0x3e/0xe0 [mdd] [486668.752376] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [486668.759725] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [486668.766556] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [486668.774685] [] mdt_reint_rename+0x13/0x20 [mdt] [486668.780992] [] mdt_reint_rec+0x83/0x210 [mdt] [486668.787130] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [486668.793787] [] mdt_reint+0x67/0x140 [mdt] [486668.799575] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [486668.806613] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [486668.814422] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [486668.820836] [] kthread+0xd1/0xe0 [486668.825851] [] ret_from_fork_nospec_begin+0xe/0x21 [486668.832416] [] 0xffffffffffffffff [486668.837525] LNet: Service thread pid 54472 was inactive for 202.35s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [486669.018630] LustreError: dumping log to /tmp/lustre-log.1550003957.56083 [486673.114728] LNet: Service thread pid 54478 was inactive for 200.23s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [486673.127682] LNet: Skipped 21 previous similar messages [486673.132933] LustreError: dumping log to /tmp/lustre-log.1550003961.54478 [486674.138753] LNet: Service thread pid 54481 was inactive for 200.75s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [486674.151710] LustreError: dumping log to /tmp/lustre-log.1550003962.54481 [486702.811478] LNet: Service thread pid 63791 was inactive for 200.60s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [486702.824434] LustreError: dumping log to /tmp/lustre-log.1550003991.63791 [486715.099782] LNet: Service thread pid 57927 was inactive for 200.39s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [486715.112738] LustreError: dumping log to /tmp/lustre-log.1550004003.57927 [486719.195891] LustreError: dumping log to /tmp/lustre-log.1550004007.54485 [486722.267959] LustreError: dumping log to /tmp/lustre-log.1550004010.56082 [486722.779978] LustreError: dumping log to /tmp/lustre-log.1550004011.57489 [486735.580296] LNet: Service thread pid 54479 was inactive for 200.51s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [486735.593247] LNet: Skipped 3 previous similar messages [486735.598398] LustreError: dumping log to /tmp/lustre-log.1550004023.54479 [486738.652375] LustreError: dumping log to /tmp/lustre-log.1550004026.54684 [486741.724456] LustreError: dumping log to /tmp/lustre-log.1550004030.54727 [486758.108862] LNet: Service thread pid 54682 was inactive for 212.45s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [486758.121809] LNet: Skipped 2 previous similar messages [486758.126960] LustreError: dumping log to /tmp/lustre-log.1550004046.54682 [486760.668935] LustreError: dumping log to /tmp/lustre-log.1550004048.54726 [486766.483073] LustreError: 56526:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003754, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff897dea6b0240/0x1a4b7ac75a4ecd13 lrc: 3/1,0 mode: --/PR res: [0x2c0003beb:0x9a9c:0x0].0x9d28c4e6 bits 0x2/0x0 rrc: 15 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 56526 timeout: 0 lvb_type: 0 [486766.484068] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [486766.484070] Lustre: Skipped 17 previous similar messages [486766.484086] LustreError: 60307:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003754, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff896d6abc8240/0x1a4b7ac75a4ece55 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 30 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75a4ece78 expref: -99 pid: 60307 timeout: 0 lvb_type: 0 [486766.484194] Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID [486766.484196] Lustre: Skipped 281 previous similar messages [486766.599242] LustreError: 56526:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 14 previous similar messages [486767.199099] LustreError: 56072:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003755, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff898698b4a400/0x1a4b7ac75a509341 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 30 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75a509348 expref: -99 pid: 56072 timeout: 0 lvb_type: 0 [486767.239272] LustreError: 56072:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 8 previous similar messages [486768.702141] LustreError: 56083:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003756, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff896dbcbb5100/0x1a4b7ac75a547b29 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 32 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75a547b30 expref: -99 pid: 56083 timeout: 0 lvb_type: 0 [486768.742306] LustreError: 56083:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message [486772.886244] LustreError: 54478:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003761, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff89654e6d18c0/0x1a4b7ac75a6054c9 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 32 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75a6054e5 expref: -99 pid: 54478 timeout: 0 lvb_type: 0 [486780.169528] Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID [486780.180479] Lustre: Skipped 1 previous similar message [486802.203977] LustreError: 63791:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003790, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8983a1274ec0/0x1a4b7ac75ab5ec35 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 102 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 63791 timeout: 0 lvb_type: 0 [486814.704289] LustreError: 57927:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003802, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff897618ff9f80/0x1a4b7ac75adcf2d9 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 32 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75adcf2e0 expref: -99 pid: 57927 timeout: 0 lvb_type: 0 [486814.744463] LustreError: 57927:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message [486835.067800] LustreError: 54479:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003823, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff89957e280240/0x1a4b7ac75b1cfa32 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 33 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75b1cfa39 expref: -99 pid: 54479 timeout: 0 lvb_type: 0 [486835.107964] LustreError: 54479:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 3 previous similar messages [486838.170876] LustreError: 54684:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003826, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8996153eb840/0x1a4b7ac75b266191 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 108 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 54684 timeout: 0 lvb_type: 0 [486873.781379] Lustre: fir-MDT0002: Client 0ccdc4e2-9749-c9a5-afb4-85874ce74d6c (at 10.0.10.3@o2ib7) reconnecting [486873.791497] Lustre: fir-MDT0002: Connection restored to df19807e-fa1a-b762-66a4-d38751783e21 (at 10.0.10.3@o2ib7) [486873.801846] Lustre: Skipped 4 previous similar messages [486876.895842] LNet: Service thread pid 56098 was inactive for 287.08s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [486876.908803] LNet: Skipped 1 previous similar message [486876.913864] LustreError: dumping log to /tmp/lustre-log.1550004165.56098 [486889.815167] LustreError: 56098:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003878, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff896e9d7af500/0x1a4b7ac75bb597a1 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 116 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 56098 timeout: 0 lvb_type: 0 [486898.084379] LustreError: 54476:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003886, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff89730d364a40/0x1a4b7ac75be083d3 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 116 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 54476 timeout: 0 lvb_type: 0 [486910.688692] LustreError: dumping log to /tmp/lustre-log.1550004198.54476 [486917.974887] LustreError: 54286:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003906, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff89775aa9b840/0x1a4b7ac75c0fe25e lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 35 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75c0fe265 expref: -99 pid: 54286 timeout: 0 lvb_type: 0 [486918.015061] LustreError: 54286:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 4 previous similar messages [486937.572371] LustreError: 54701:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003925, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff89707069c800/0x1a4b7ac75c3e1cef lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 122 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 54701 timeout: 0 lvb_type: 0 [486981.346464] LNet: Service thread pid 56075 was inactive for 363.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [486981.363488] LNet: Skipped 3 previous similar messages [486981.368637] Pid: 56075, comm: mdt02_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [486981.378509] Call Trace: [486981.381072] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [486981.388121] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [486981.395325] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [486981.402076] [] osp_md_object_lock+0x162/0x2d0 [osp] [486981.408734] [] lod_object_lock+0xf3/0x7b0 [lod] [486981.415068] [] mdd_object_lock+0x3e/0xe0 [mdd] [486981.421293] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [486981.428676] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [486981.435504] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [486981.443634] [] mdt_reint_rename+0x13/0x20 [mdt] [486981.449953] [] mdt_reint_rec+0x83/0x210 [mdt] [486981.456088] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [486981.462746] [] mdt_reint+0x67/0x140 [mdt] [486981.468525] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [486981.475554] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [486981.483355] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [486981.489781] [] kthread+0xd1/0xe0 [486981.494785] [] ret_from_fork_nospec_begin+0xe/0x21 [486981.501348] [] 0xffffffffffffffff [486981.506457] LustreError: dumping log to /tmp/lustre-log.1550004269.56075 [486981.809958] Pid: 54286, comm: mdt01_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [486981.819792] Call Trace: [486981.822350] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [486981.829370] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [486981.836572] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [486981.843333] [] osp_md_object_lock+0x162/0x2d0 [osp] [486981.849997] [] lod_object_lock+0xf3/0x7b0 [lod] [486981.856309] [] mdd_object_lock+0x3e/0xe0 [mdd] [486981.862522] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [486981.869872] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [486981.876701] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [486981.884829] [] mdt_reint_rename+0x13/0x20 [mdt] [486981.891138] [] mdt_reint_rec+0x83/0x210 [mdt] [486981.897275] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [486981.903946] [] mdt_reint+0x67/0x140 [mdt] [486981.909740] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [486981.916775] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [486981.924584] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [486981.931008] [] kthread+0xd1/0xe0 [486981.936009] [] ret_from_fork_nospec_begin+0xe/0x21 [486981.942569] [] 0xffffffffffffffff [486981.947687] Pid: 56062, comm: mdt01_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [486981.957514] Call Trace: [486981.960069] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [486981.967115] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [486981.974326] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [486981.981095] [] osp_md_object_lock+0x162/0x2d0 [osp] [486981.987740] [] lod_object_lock+0xf3/0x7b0 [lod] [486981.994051] [] mdd_object_lock+0x3e/0xe0 [mdd] [486982.000272] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [486982.007623] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [486982.014452] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [486982.022581] [] mdt_reint_rename+0x13/0x20 [mdt] [486982.028892] [] mdt_reint_rec+0x83/0x210 [mdt] [486982.035027] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [486982.041698] [] mdt_reint+0x67/0x140 [mdt] [486982.047491] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [486982.054537] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [486982.062345] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [486982.068767] [] kthread+0xd1/0xe0 [486982.073767] [] ret_from_fork_nospec_begin+0xe/0x21 [486982.080329] [] 0xffffffffffffffff [486989.538666] LNet: Service thread pid 63803 was inactive for 362.33s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [486989.555691] LNet: Skipped 2 previous similar messages [486989.560840] Pid: 63803, comm: mdt01_062 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [486989.570666] Call Trace: [486989.573221] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [486989.580242] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [486989.587467] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [486989.594227] [] osp_md_object_lock+0x162/0x2d0 [osp] [486989.600886] [] lod_object_lock+0xf3/0x7b0 [lod] [486989.607194] [] mdd_object_lock+0x3e/0xe0 [mdd] [486989.613416] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [486989.620783] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [486989.627612] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [486989.635744] [] mdt_reint_rename+0x13/0x20 [mdt] [486989.642080] [] mdt_reint_rec+0x83/0x210 [mdt] [486989.648230] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [486989.654907] [] mdt_reint+0x67/0x140 [mdt] [486989.660704] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [486989.667777] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [486989.675576] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [486989.682004] [] kthread+0xd1/0xe0 [486989.687007] [] ret_from_fork_nospec_begin+0xe/0x21 [486989.693570] [] 0xffffffffffffffff [486989.698675] LustreError: dumping log to /tmp/lustre-log.1550004277.63803 [486989.853698] LustreError: 54672:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550003978, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8957c7c32ac0/0x1a4b7ac75cc5f5f2 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 36 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75cc5f623 expref: -99 pid: 54672 timeout: 0 lvb_type: 0 [486989.893896] LustreError: 54672:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 3 previous similar messages [487024.355547] LNet: Service thread pid 54701 was inactive for 386.78s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [487024.372592] Pid: 54701, comm: mdt01_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [487024.382419] Call Trace: [487024.384992] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [487024.392022] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [487024.399304] [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] [487024.407458] [] mdt_reint_rename+0x13/0x20 [mdt] [487024.413769] [] mdt_reint_rec+0x83/0x210 [mdt] [487024.419904] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [487024.426569] [] mdt_reint+0x67/0x140 [mdt] [487024.432351] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [487024.439404] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [487024.447207] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [487024.453634] [] kthread+0xd1/0xe0 [487024.458640] [] ret_from_fork_nospec_begin+0xe/0x21 [487024.465221] [] 0xffffffffffffffff [487024.470355] LustreError: dumping log to /tmp/lustre-log.1550004312.54701 [487060.916476] Lustre: 56100:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff897753e4d700 x1624733225727744/t0(0) o101->9468a1d3-3abd-8063-5952-288cca0f1dec@10.8.27.35@o2ib6:679/0 lens 592/3264 e 24 to 0 dl 1550004354 ref 2 fl Interpret:/0/0 rc 0/0 [487061.604500] Lustre: 54705:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff896665b2b900 x1624759272056352/t0(0) o36->105b4037-1bdb-9aac-e52a-dc0e974000a2@10.9.112.12@o2ib4:679/0 lens 584/2888 e 24 to 0 dl 1550004354 ref 2 fl Interpret:/0/0 rc 0/0 [487061.633573] Lustre: 54705:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20 previous similar messages [487062.920524] Lustre: 54728:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8972b3202a00 x1624745621954256/t0(0) o36->fc60883d-f9c1-82aa-8312-f53a10d6b6ff@10.8.9.1@o2ib6:681/0 lens 640/2888 e 23 to 0 dl 1550004356 ref 2 fl Interpret:/0/0 rc 0/0 [487062.949338] Lustre: 54728:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages [487067.231617] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [487067.246672] LustreError: 54691:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550004055, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff89869ea0f080/0x1a4b7ac75d8c42e7 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 38 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75d8c42ee expref: -99 pid: 54691 timeout: 0 lvb_type: 0 [487067.286860] LustreError: 54691:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 1 previous similar message [487067.297923] Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID [487067.924646] Lustre: 54728:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff897068f63f00 x1624699439258064/t0(0) o36->82c763a5-8dc0-7fc4-d90b-e4497b03f725@10.9.104.57@o2ib4:686/0 lens 552/2888 e 23 to 0 dl 1550004361 ref 2 fl Interpret:/0/0 rc 0/0 [487082.192998] LustreError: 54212:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550004070, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8972d3b39200/0x1a4b7ac75dadf533 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 151 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 54212 timeout: 0 lvb_type: 0 [487097.381388] Lustre: 63794:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff89834ef60f00 x1624748845592656/t0(0) o36->c243879d-6590-e58d-10d6-105c5b7b4def@10.8.28.1@o2ib6:715/0 lens 520/2888 e 11 to 0 dl 1550004390 ref 2 fl Interpret:/0/0 rc 0/0 [487097.410287] Lustre: 63794:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message [487108.997678] Lustre: 54707:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff89744df2b600 x1624705420027024/t0(0) o36->6c7afae7-804f-5837-942b-b1962fecb1db@10.8.8.33@o2ib6:727/0 lens 552/2888 e 9 to 0 dl 1550004402 ref 2 fl Interpret:/0/0 rc 0/0 [487109.962276] Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID [487109.973238] Lustre: Skipped 1 previous similar message [487130.022207] Lustre: 54538:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff898169255a00 x1624712098374304/t0(0) o36->bdf3d0f5-851d-26cd-ba90-9355de313856@10.8.6.19@o2ib6:748/0 lens 520/2888 e 6 to 0 dl 1550004423 ref 2 fl Interpret:/0/0 rc 0/0 [487130.051026] Lustre: 54538:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages [487175.911347] LNet: Service thread pid 54672 was inactive for 486.05s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [487175.924302] LNet: Skipped 1 previous similar message [487175.929360] LustreError: dumping log to /tmp/lustre-log.1550004464.54672 [487185.127593] Lustre: 54686:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff897c82ad0000 x1624702288689424/t0(0) o36->e7658bee-b529-b857-21ef-217c5e9fe7b7@10.9.113.9@o2ib4:48/0 lens 536/2888 e 3 to 0 dl 1550004478 ref 2 fl Interpret:/0/0 rc 0/0 [487185.156412] Lustre: 54686:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages [487199.085945] LustreError: 54694:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550004187, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8996aabea880/0x1a4b7ac75edb4280 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 40 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac75edb428e expref: -99 pid: 54694 timeout: 0 lvb_type: 0 [487199.126110] LustreError: 54694:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 2 previous similar messages [487241.311994] LustreError: 63820:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550004229, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff895bb9c4c140/0x1a4b7ac75f41d6dd lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 172 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 63820 timeout: 0 lvb_type: 0 [487285.354106] Lustre: 60308:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff897c82b25a00 x1624925569064752/t0(0) o36->0a270a28-384b-10a4-3edd-bee22242e3ea@10.8.4.35@o2ib6:148/0 lens 560/2888 e 2 to 0 dl 1550004578 ref 2 fl Interpret:/0/0 rc 0/0 [487285.382920] Lustre: 60308:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages [487382.764545] LNet: Service thread pid 55382 was inactive for 615.50s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [487382.781570] Pid: 55382, comm: mdt00_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [487382.791400] Call Trace: [487382.793956] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [487382.800990] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [487382.808201] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [487382.814963] [] osp_md_object_lock+0x162/0x2d0 [osp] [487382.821618] [] lod_object_lock+0xf3/0x7b0 [lod] [487382.827927] [] mdd_object_lock+0x3e/0xe0 [mdd] [487382.834150] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [487382.841499] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [487382.848335] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [487382.856467] [] mdt_reint_rename+0x13/0x20 [mdt] [487382.862793] [] mdt_reint_rec+0x83/0x210 [mdt] [487382.868932] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [487382.875588] [] mdt_reint+0x67/0x140 [mdt] [487382.881383] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [487382.888422] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [487382.896234] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [487382.902653] [] kthread+0xd1/0xe0 [487382.907656] [] ret_from_fork_nospec_begin+0xe/0x21 [487382.914214] [] 0xffffffffffffffff [487382.919323] LustreError: dumping log to /tmp/lustre-log.1550004671.55382 [487399.845568] Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID [487417.069408] Lustre: 54696:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff89662c672400 x1624763344710448/t0(0) o36->0405d36b-1dfe-417d-33da-88f65ca0bd9f@10.8.28.4@o2ib6:280/0 lens 552/2888 e 1 to 0 dl 1550004710 ref 2 fl Interpret:/0/0 rc 0/0 [487417.098223] Lustre: 54696:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages [487429.869721] LNet: Service thread pid 54691 was inactive for 662.62s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [487429.886770] Pid: 54691, comm: mdt02_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [487429.896601] Call Trace: [487429.899155] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [487429.906202] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [487429.913417] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [487429.920186] [] osp_md_object_lock+0x162/0x2d0 [osp] [487429.926842] [] lod_object_lock+0xf3/0x7b0 [lod] [487429.933151] [] mdd_object_lock+0x3e/0xe0 [mdd] [487429.939375] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [487429.946740] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [487429.953570] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [487429.961718] [] mdt_reint_rename+0x13/0x20 [mdt] [487429.968027] [] mdt_reint_rec+0x83/0x210 [mdt] [487429.974178] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [487429.980838] [] mdt_reint+0x67/0x140 [mdt] [487429.986627] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [487429.993672] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [487430.001483] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [487430.007895] [] kthread+0xd1/0xe0 [487430.012911] [] ret_from_fork_nospec_begin+0xe/0x21 [487430.019474] [] 0xffffffffffffffff [487430.024595] LustreError: dumping log to /tmp/lustre-log.1550004718.54691 [487434.250829] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [487470.830743] LNet: Service thread pid 54212 was inactive for 688.62s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [487470.847773] Pid: 54212, comm: mdt01_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [487470.857595] Call Trace: [487470.860151] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [487470.867169] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [487470.874466] [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] [487470.882606] [] mdt_reint_rename+0x13/0x20 [mdt] [487470.888931] [] mdt_reint_rec+0x83/0x210 [mdt] [487470.895069] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [487470.901724] [] mdt_reint+0x67/0x140 [mdt] [487470.907514] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [487470.914570] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [487470.922378] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [487470.928808] [] kthread+0xd1/0xe0 [487470.933809] [] ret_from_fork_nospec_begin+0xe/0x21 [487470.940400] [] 0xffffffffffffffff [487470.945520] LustreError: dumping log to /tmp/lustre-log.1550004759.54212 [487473.734830] LustreError: 54707:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550004462, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff896bb53cfbc0/0x1a4b7ac761353368 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 201 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 54707 timeout: 0 lvb_type: 0 [487473.774221] LustreError: 54707:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message [487474.748915] Lustre: fir-MDT0002: Client e9560223-f857-8af8-8e66-18924c1e4b0e (at 10.8.3.22@o2ib6) reconnecting [487474.759010] Lustre: Skipped 52 previous similar messages [487474.764445] Lustre: fir-MDT0002: Connection restored to 6e1d62f3-a431-7ead-00c3-acc48801def9 (at 10.8.3.22@o2ib6) [487474.774808] Lustre: Skipped 59 previous similar messages [487546.400654] LustreError: 60306:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550004534, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff896d0a60a880/0x1a4b7ac761bad0be lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 47 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac761bad0cc expref: -99 pid: 60306 timeout: 0 lvb_type: 0 [487546.440824] LustreError: 60306:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 5 previous similar messages [487589.617727] LNet: Service thread pid 54473 was inactive for 767.76s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [487589.634763] Pid: 54473, comm: mdt00_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [487589.644591] Call Trace: [487589.647164] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [487589.654206] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [487589.661399] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [487589.668160] [] osp_md_object_lock+0x162/0x2d0 [osp] [487589.674808] [] lod_object_lock+0xf3/0x7b0 [lod] [487589.681116] [] mdd_object_lock+0x3e/0xe0 [mdd] [487589.687341] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [487589.694688] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [487589.701520] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [487589.709663] [] mdt_reint_rename+0x13/0x20 [mdt] [487589.715975] [] mdt_reint_rec+0x83/0x210 [mdt] [487589.722126] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [487589.728785] [] mdt_reint+0x67/0x140 [mdt] [487589.734572] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [487589.741621] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [487589.749429] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [487589.755851] [] kthread+0xd1/0xe0 [487589.760852] [] ret_from_fork_nospec_begin+0xe/0x21 [487589.767413] [] 0xffffffffffffffff [487589.772536] LustreError: dumping log to /tmp/lustre-log.1550004878.54473 [487697.140442] Lustre: 63794:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-95), not sending early reply req@ffff89946f6e3c00 x1624711672652240/t0(0) o36->b0f4a89e-7973-eb31-a1c9-fdc42b6cc4f6@10.8.18.25@o2ib6:560/0 lens 560/2888 e 0 to 0 dl 1550004990 ref 2 fl Interpret:/0/0 rc 0/0 [487697.169518] Lustre: 63794:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages [487737.077426] LNet: Service thread pid 56097 was inactive for 863.37s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [487737.094472] Pid: 56097, comm: mdt01_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [487737.104299] Call Trace: [487737.106859] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [487737.113891] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [487737.121101] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [487737.127870] [] osp_md_object_lock+0x162/0x2d0 [osp] [487737.134526] [] lod_object_lock+0xf3/0x7b0 [lod] [487737.140834] [] mdd_object_lock+0x3e/0xe0 [mdd] [487737.147058] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [487737.154424] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [487737.161254] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [487737.169382] [] mdt_reint_rename+0x13/0x20 [mdt] [487737.175694] [] mdt_reint_rec+0x83/0x210 [mdt] [487737.181845] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [487737.188518] [] mdt_reint+0x67/0x140 [mdt] [487737.194309] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [487737.201347] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [487737.209166] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [487737.215577] [] kthread+0xd1/0xe0 [487737.220579] [] ret_from_fork_nospec_begin+0xe/0x21 [487737.227140] [] 0xffffffffffffffff [487737.232263] LustreError: dumping log to /tmp/lustre-log.1550005025.56097 [487812.742392] Lustre: fir-MDT0002: haven't heard from client 3d648bf8-1d55-290d-eec0-122a40706ad8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89621bbc9000, cur 1550005101 expire 1550004951 last 1550004874 [487812.764183] Lustre: Skipped 2 previous similar messages [487812.855331] Pid: 54694, comm: mdt01_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [487812.865164] Call Trace: [487812.867714] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [487812.874739] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [487812.881950] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [487812.888710] [] osp_md_object_lock+0x162/0x2d0 [osp] [487812.895358] [] lod_object_lock+0xf3/0x7b0 [lod] [487812.901668] [] mdd_object_lock+0x3e/0xe0 [mdd] [487812.907891] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [487812.915240] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [487812.922069] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [487812.930216] [] mdt_reint_rename+0x13/0x20 [mdt] [487812.936526] [] mdt_reint_rec+0x83/0x210 [mdt] [487812.942665] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [487812.949327] [] mdt_reint+0x67/0x140 [mdt] [487812.955115] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [487812.962155] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [487812.969979] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [487812.976394] [] kthread+0xd1/0xe0 [487812.981396] [] ret_from_fork_nospec_begin+0xe/0x21 [487812.987956] [] 0xffffffffffffffff [487812.993061] LustreError: dumping log to /tmp/lustre-log.1550005101.54694 [487855.652404] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [487855.667564] Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID [487855.677486] Lustre: Skipped 2 previous similar messages [487929.594258] Pid: 63820, comm: mdt03_033 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [487929.604088] Call Trace: [487929.606645] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [487929.613706] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [487929.620986] [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] [487929.629123] [] mdt_reint_rename+0x13/0x20 [mdt] [487929.635434] [] mdt_reint_rec+0x83/0x210 [mdt] [487929.641569] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [487929.648225] [] mdt_reint+0x67/0x140 [mdt] [487929.654007] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [487929.661044] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [487929.668868] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [487929.675285] [] kthread+0xd1/0xe0 [487929.680275] [] ret_from_fork_nospec_begin+0xe/0x21 [487929.686839] [] 0xffffffffffffffff [487929.691953] LustreError: dumping log to /tmp/lustre-log.1550005217.63820 [487978.747493] Pid: 56135, comm: mdt00_035 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [487978.757317] Call Trace: [487978.759883] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [487978.766926] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [487978.774138] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [487978.780898] [] osp_md_object_lock+0x162/0x2d0 [osp] [487978.787572] [] lod_object_lock+0xf3/0x7b0 [lod] [487978.793879] [] mdd_object_lock+0x3e/0xe0 [mdd] [487978.800118] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [487978.807485] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [487978.814322] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [487978.822458] [] mdt_reint_rename+0x13/0x20 [mdt] [487978.828808] [] mdt_reint_rec+0x83/0x210 [mdt] [487978.834960] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [487978.841625] [] mdt_reint+0x67/0x140 [mdt] [487978.847417] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [487978.854477] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [487978.862289] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [487978.868725] [] kthread+0xd1/0xe0 [487978.873728] [] ret_from_fork_nospec_begin+0xe/0x21 [487978.880289] [] 0xffffffffffffffff [487978.885398] LustreError: dumping log to /tmp/lustre-log.1550005267.56135 [488075.857832] Lustre: fir-MDT0002: Client e9560223-f857-8af8-8e66-18924c1e4b0e (at 10.8.3.22@o2ib6) reconnecting [488075.867923] Lustre: Skipped 64 previous similar messages [488075.873348] Lustre: fir-MDT0002: Connection restored to 6e1d62f3-a431-7ead-00c3-acc48801def9 (at 10.8.3.22@o2ib6) [488075.883711] Lustre: Skipped 69 previous similar messages [488118.014994] LNet: Service thread pid 63807 was inactive for 1115.51s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [488118.032104] LNet: Skipped 3 previous similar messages [488118.037255] Pid: 63807, comm: mdt02_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488118.047079] Call Trace: [488118.049635] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488118.056662] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [488118.063942] [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] [488118.072080] [] mdt_reint_rename+0x13/0x20 [mdt] [488118.078390] [] mdt_reint_rec+0x83/0x210 [mdt] [488118.084542] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488118.091201] [] mdt_reint+0x67/0x140 [mdt] [488118.096981] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488118.104018] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488118.111826] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488118.118248] [] kthread+0xd1/0xe0 [488118.123250] [] ret_from_fork_nospec_begin+0xe/0x21 [488118.129816] [] 0xffffffffffffffff [488118.134928] LustreError: dumping log to /tmp/lustre-log.1550005406.63807 [488146.688702] Pid: 57413, comm: mdt01_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488146.698530] Call Trace: [488146.701085] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488146.708111] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488146.715312] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488146.722080] [] osp_md_object_lock+0x162/0x2d0 [osp] [488146.728729] [] lod_object_lock+0xf3/0x7b0 [lod] [488146.735058] [] mdd_object_lock+0x3e/0xe0 [mdd] [488146.741269] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488146.748621] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488146.755463] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488146.763597] [] mdt_reint_rename+0x13/0x20 [mdt] [488146.769920] [] mdt_reint_rec+0x83/0x210 [mdt] [488146.776051] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488146.782734] [] mdt_reint+0x67/0x140 [mdt] [488146.788531] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488146.795584] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488146.803396] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488146.809824] [] kthread+0xd1/0xe0 [488146.814827] [] ret_from_fork_nospec_begin+0xe/0x21 [488146.821416] [] 0xffffffffffffffff [488146.826528] LustreError: dumping log to /tmp/lustre-log.1550005435.57413 [488253.186382] Pid: 54680, comm: mdt00_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488253.196213] Call Trace: [488253.198766] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488253.205814] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488253.213011] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488253.219770] [] osp_md_object_lock+0x162/0x2d0 [osp] [488253.226461] [] lod_object_lock+0xf3/0x7b0 [lod] [488253.232763] [] mdd_object_lock+0x3e/0xe0 [mdd] [488253.239011] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488253.246359] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488253.253205] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488253.261337] [] mdt_reint_rename+0x13/0x20 [mdt] [488253.267646] [] mdt_reint_rec+0x83/0x210 [mdt] [488253.273782] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488253.280438] [] mdt_reint+0x67/0x140 [mdt] [488253.286229] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488253.293291] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488253.301101] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488253.307521] [] kthread+0xd1/0xe0 [488253.312540] [] ret_from_fork_nospec_begin+0xe/0x21 [488253.319117] [] 0xffffffffffffffff [488253.324227] LustreError: dumping log to /tmp/lustre-log.1550005541.54680 [488300.194858] Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID [488300.205814] Lustre: Skipped 2 previous similar messages [488305.411706] Lustre: 54218:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply req@ffff89945374aa00 x1624674080926720/t0(0) o36->116e39d9-c33f-c321-735f-9ea512fa0ba7@10.9.101.23@o2ib4:413/0 lens 560/2888 e 0 to 0 dl 1550005598 ref 2 fl Interpret:/0/0 rc 0/0 [488305.440954] Lustre: 54218:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages [488335.108440] Pid: 56100, comm: mdt01_050 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488335.118264] Call Trace: [488335.120830] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488335.127857] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488335.135049] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488335.141826] [] osp_md_object_lock+0x162/0x2d0 [osp] [488335.148475] [] lod_object_lock+0xf3/0x7b0 [lod] [488335.154800] [] mdd_object_lock+0x3e/0xe0 [mdd] [488335.161039] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488335.168406] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488335.175252] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488335.183384] [] mdt_reint_rename+0x13/0x20 [mdt] [488335.189722] [] mdt_reint_rec+0x83/0x210 [mdt] [488335.195856] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488335.202528] [] mdt_reint+0x67/0x140 [mdt] [488335.208318] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488335.215372] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488335.223174] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488335.229602] [] kthread+0xd1/0xe0 [488335.234621] [] ret_from_fork_nospec_begin+0xe/0x21 [488335.241198] [] 0xffffffffffffffff [488335.246309] LustreError: dumping log to /tmp/lustre-log.1550005623.56100 [488335.583502] Pid: 56099, comm: mdt01_049 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488335.593333] Call Trace: [488335.595907] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488335.602941] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488335.610172] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488335.616947] [] osp_md_object_lock+0x162/0x2d0 [osp] [488335.623601] [] lod_object_lock+0xf3/0x7b0 [lod] [488335.629928] [] mdd_object_lock+0x3e/0xe0 [mdd] [488335.636167] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488335.643519] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488335.650363] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488335.658496] [] mdt_reint_rename+0x13/0x20 [mdt] [488335.664821] [] mdt_reint_rec+0x83/0x210 [mdt] [488335.670959] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488335.677614] [] mdt_reint+0x67/0x140 [mdt] [488335.683402] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488335.690457] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488335.698267] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488335.704696] [] kthread+0xd1/0xe0 [488335.709704] [] ret_from_fork_nospec_begin+0xe/0x21 [488335.716286] [] 0xffffffffffffffff [488376.069465] LNet: Service thread pid 54707 was inactive for 1202.31s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [488376.082497] LNet: Skipped 1 previous similar message [488376.087559] LustreError: dumping log to /tmp/lustre-log.1550005664.54707 [488404.187175] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [488404.202228] LustreError: 54709:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550005392, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff8997f5a13180/0x1a4b7ac768fd34d3 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 54 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac768fd34ef expref: -99 pid: 54709 timeout: 0 lvb_type: 0 [488404.242389] LustreError: 54709:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 9 previous similar messages [488449.799314] Pid: 60306, comm: mdt01_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488449.809140] Call Trace: [488449.811691] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488449.818716] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488449.825908] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488449.832669] [] osp_md_object_lock+0x162/0x2d0 [osp] [488449.839317] [] lod_object_lock+0xf3/0x7b0 [lod] [488449.845627] [] mdd_object_lock+0x3e/0xe0 [mdd] [488449.851848] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488449.859216] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488449.866036] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488449.874165] [] mdt_reint_rename+0x13/0x20 [mdt] [488449.880476] [] mdt_reint_rec+0x83/0x210 [mdt] [488449.886628] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488449.893286] [] mdt_reint+0x67/0x140 [mdt] [488449.899074] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488449.906104] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488449.913917] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488449.920334] [] kthread+0xd1/0xe0 [488449.925335] [] ret_from_fork_nospec_begin+0xe/0x21 [488449.931897] [] 0xffffffffffffffff [488449.937026] LustreError: dumping log to /tmp/lustre-log.1550005738.60306 [488453.895413] Pid: 54219, comm: mdt03_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488453.905246] Call Trace: [488453.907805] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488453.914835] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488453.922037] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488453.928796] [] osp_md_object_lock+0x162/0x2d0 [osp] [488453.935453] [] lod_object_lock+0xf3/0x7b0 [lod] [488453.941762] [] mdd_object_lock+0x3e/0xe0 [mdd] [488453.948008] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488453.955372] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488453.962199] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488453.970330] [] mdt_reint_rename+0x13/0x20 [mdt] [488453.976648] [] mdt_reint_rec+0x83/0x210 [mdt] [488453.982785] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488453.989433] [] mdt_reint+0x67/0x140 [mdt] [488453.995231] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488454.002313] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488454.010130] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488454.016558] [] kthread+0xd1/0xe0 [488454.021570] [] ret_from_fork_nospec_begin+0xe/0x21 [488454.028131] [] 0xffffffffffffffff [488454.033235] LustreError: dumping log to /tmp/lustre-log.1550005742.54219 [488511.420865] LustreError: 68943:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550005499, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff89942d2286c0/0x1a4b7ac769d11476 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 226 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 68943 timeout: 0 lvb_type: 0 [488523.530161] Pid: 63816, comm: mdt00_049 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488523.539985] Call Trace: [488523.542541] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488523.549574] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488523.556768] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488523.563529] [] osp_md_object_lock+0x162/0x2d0 [osp] [488523.570175] [] lod_object_lock+0xf3/0x7b0 [lod] [488523.576492] [] mdd_object_lock+0x3e/0xe0 [mdd] [488523.582708] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488523.590056] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488523.596878] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488523.605024] [] mdt_reint_rename+0x13/0x20 [mdt] [488523.611352] [] mdt_reint_rec+0x83/0x210 [mdt] [488523.617480] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488523.624151] [] mdt_reint+0x67/0x140 [mdt] [488523.629942] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488523.636979] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488523.644804] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488523.651220] [] kthread+0xd1/0xe0 [488523.656220] [] ret_from_fork_nospec_begin+0xe/0x21 [488523.662774] [] 0xffffffffffffffff [488523.667879] LustreError: dumping log to /tmp/lustre-log.1550005811.63816 [488523.872843] Pid: 56071, comm: mdt03_019 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488523.882670] Call Trace: [488523.885220] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488523.892248] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488523.899456] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488523.906230] [] osp_md_object_lock+0x162/0x2d0 [osp] [488523.912881] [] lod_object_lock+0xf3/0x7b0 [lod] [488523.919192] [] mdd_object_lock+0x3e/0xe0 [mdd] [488523.925422] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488523.932773] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488523.939618] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488523.947750] [] mdt_reint_rename+0x13/0x20 [mdt] [488523.954065] [] mdt_reint_rec+0x83/0x210 [mdt] [488523.960195] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488523.966843] [] mdt_reint+0x67/0x140 [mdt] [488523.972639] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488523.979701] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488523.987504] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488523.993916] [] kthread+0xd1/0xe0 [488523.998908] [] ret_from_fork_nospec_begin+0xe/0x21 [488524.005470] [] 0xffffffffffffffff [488548.987811] LustreError: 63795:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550005537, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff89868e310d80/0x1a4b7ac76a5b9e20 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 226 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 63795 timeout: 0 lvb_type: 0 [488605.451221] Pid: 55384, comm: mdt00_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488605.461049] Call Trace: [488605.463606] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488605.470639] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488605.477850] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488605.484613] [] osp_md_object_lock+0x162/0x2d0 [osp] [488605.491289] [] lod_object_lock+0xf3/0x7b0 [lod] [488605.497593] [] mdd_object_lock+0x3e/0xe0 [mdd] [488605.503820] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488605.511174] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488605.518014] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488605.526165] [] mdt_reint_rename+0x13/0x20 [mdt] [488605.532475] [] mdt_reint_rec+0x83/0x210 [mdt] [488605.538611] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488605.545260] [] mdt_reint+0x67/0x140 [mdt] [488605.551050] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488605.558110] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488605.565921] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488605.572342] [] kthread+0xd1/0xe0 [488605.577345] [] ret_from_fork_nospec_begin+0xe/0x21 [488605.583921] [] 0xffffffffffffffff [488605.589060] LustreError: dumping log to /tmp/lustre-log.1550005893.55384 [488605.843167] LNet: Service thread pid 63811 was inactive for 1201.01s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [488608.779323] LustreError: 54686:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550005597, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff896c03aac800/0x1a4b7ac76ad05de9 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 227 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 54686 timeout: 0 lvb_type: 0 [488676.966885] Lustre: fir-MDT0002: Client e9560223-f857-8af8-8e66-18924c1e4b0e (at 10.8.3.22@o2ib6) reconnecting [488676.976974] Lustre: Skipped 69 previous similar messages [488676.982419] Lustre: fir-MDT0002: Connection restored to 6e1d62f3-a431-7ead-00c3-acc48801def9 (at 10.8.3.22@o2ib6) [488676.992779] Lustre: Skipped 72 previous similar messages [488757.007022] LNet: Service thread pid 56058 was inactive for 1201.33s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [488757.024150] LNet: Skipped 9 previous similar messages [488757.029304] Pid: 56058, comm: mdt03_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488757.039134] Call Trace: [488757.041687] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488757.048714] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488757.055907] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488757.062684] [] osp_md_object_lock+0x162/0x2d0 [osp] [488757.069342] [] lod_object_lock+0xf3/0x7b0 [lod] [488757.075652] [] mdd_object_lock+0x3e/0xe0 [mdd] [488757.081873] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488757.089234] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488757.096063] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488757.104192] [] mdt_reint_rename+0x13/0x20 [mdt] [488757.110503] [] mdt_reint_rec+0x83/0x210 [mdt] [488757.116652] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488757.123312] [] mdt_reint+0x67/0x140 [mdt] [488757.129103] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488757.136147] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488757.143971] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488757.150385] [] kthread+0xd1/0xe0 [488757.155402] [] ret_from_fork_nospec_begin+0xe/0x21 [488757.161967] [] 0xffffffffffffffff [488757.167087] LustreError: dumping log to /tmp/lustre-log.1550006045.56058 [488760.614121] LustreError: 54717:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550005748, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff895824417740/0x1a4b7ac76bdc7da4 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 228 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 54717 timeout: 0 lvb_type: 0 [488879.890109] Pid: 57445, comm: mdt03_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488879.899935] Call Trace: [488879.902492] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488879.909524] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488879.916735] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488879.923506] [] osp_md_object_lock+0x162/0x2d0 [osp] [488879.930178] [] lod_object_lock+0xf3/0x7b0 [lod] [488879.936496] [] mdd_object_lock+0x3e/0xe0 [mdd] [488879.942727] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488879.950086] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488879.956939] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488879.965101] [] mdt_reint_rename+0x13/0x20 [mdt] [488879.971440] [] mdt_reint_rec+0x83/0x210 [mdt] [488879.977579] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488879.984235] [] mdt_reint+0x67/0x140 [mdt] [488879.990023] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488879.997124] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488880.004938] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488880.011402] [] kthread+0xd1/0xe0 [488880.016414] [] ret_from_fork_nospec_begin+0xe/0x21 [488880.023007] [] 0xffffffffffffffff [488880.028115] LustreError: dumping log to /tmp/lustre-log.1550006168.57445 [488880.275158] Pid: 54674, comm: mdt02_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488880.284986] Call Trace: [488880.287544] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [488880.294579] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488880.301789] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488880.308566] [] osp_md_object_lock+0x162/0x2d0 [osp] [488880.315230] [] lod_object_lock+0xf3/0x7b0 [lod] [488880.321548] [] mdd_object_lock+0x3e/0xe0 [mdd] [488880.327771] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488880.335121] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488880.341949] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488880.350081] [] mdt_reint_rename+0x13/0x20 [mdt] [488880.356407] [] mdt_reint_rec+0x83/0x210 [mdt] [488880.362552] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488880.369217] [] mdt_reint+0x67/0x140 [mdt] [488880.375006] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488880.382044] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488880.389871] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488880.396284] [] kthread+0xd1/0xe0 [488880.401296] [] ret_from_fork_nospec_begin+0xe/0x21 [488880.407880] [] 0xffffffffffffffff [488888.082320] Pid: 54699, comm: mdt01_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [488888.092146] Call Trace: [488888.094705] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [488888.101738] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [488888.108931] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [488888.115708] [] osp_md_object_lock+0x162/0x2d0 [osp] [488888.122372] [] lod_object_lock+0xf3/0x7b0 [lod] [488888.128675] [] mdd_object_lock+0x3e/0xe0 [mdd] [488888.134888] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [488888.142237] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [488888.149059] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [488888.157180] [] mdt_reint_rename+0x13/0x20 [mdt] [488888.163506] [] mdt_reint_rec+0x83/0x210 [mdt] [488888.169651] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [488888.176302] [] mdt_reint+0x67/0x140 [mdt] [488888.182097] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [488888.189134] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [488888.196935] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [488888.203349] [] kthread+0xd1/0xe0 [488888.208350] [] ret_from_fork_nospec_begin+0xe/0x21 [488888.214903] [] 0xffffffffffffffff [488888.220024] LustreError: dumping log to /tmp/lustre-log.1550006176.54699 [488893.058816] Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID [488893.069780] Lustre: Skipped 1 previous similar message [488952.839940] Lustre: 56065:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply req@ffff899450e56f00 x1624699750411024/t0(0) o36->4f2a8620-81f0-e31b-fbad-a029c3256423@10.9.105.43@o2ib4:306/0 lens 528/2888 e 0 to 0 dl 1550006246 ref 2 fl Interpret:/0/0 rc 0/0 [488952.869375] Lustre: 56065:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages [488987.490803] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [489027.861816] Pid: 56134, comm: mdt00_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [489027.871645] Call Trace: [489027.874205] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [489027.881242] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [489027.888468] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [489027.895231] [] osp_md_object_lock+0x162/0x2d0 [osp] [489027.901878] [] lod_object_lock+0xf3/0x7b0 [lod] [489027.908187] [] mdd_object_lock+0x3e/0xe0 [mdd] [489027.914417] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [489027.921769] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [489027.928614] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [489027.936744] [] mdt_reint_rename+0x13/0x20 [mdt] [489027.943055] [] mdt_reint_rec+0x83/0x210 [mdt] [489027.949192] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [489027.955876] [] mdt_reint+0x67/0x140 [mdt] [489027.961671] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [489027.968724] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [489027.976534] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [489027.983051] [] kthread+0xd1/0xe0 [489027.988191] [] ret_from_fork_nospec_begin+0xe/0x21 [489027.994831] [] 0xffffffffffffffff [489028.000020] LustreError: dumping log to /tmp/lustre-log.1550006316.56134 [489127.375333] LustreError: 56134:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550006115, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff896fd67bad00/0x1a4b7ac76eb6c89a lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 58 type: IBT flags: 0x1000001000000 nid: local remote: 0x1a4b7ac76eb6c8a1 expref: -99 pid: 56134 timeout: 0 lvb_type: 0 [489127.415520] LustreError: 56134:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 4 previous similar messages [489158.177725] Lustre: DEBUG MARKER: Tue Feb 12 13:20:46 2019 [489191.193918] Pid: 54678, comm: mdt02_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [489191.203748] Call Trace: [489191.206310] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [489191.213348] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [489191.220558] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [489191.227334] [] osp_md_object_lock+0x162/0x2d0 [osp] [489191.234001] [] lod_object_lock+0xf3/0x7b0 [lod] [489191.240312] [] mdd_object_lock+0x3e/0xe0 [mdd] [489191.246539] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [489191.253892] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [489191.260722] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [489191.268866] [] mdt_reint_rename+0x13/0x20 [mdt] [489191.275177] [] mdt_reint_rec+0x83/0x210 [mdt] [489191.281314] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [489191.287975] [] mdt_reint+0x67/0x140 [mdt] [489191.293759] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [489191.300820] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [489191.308622] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [489191.315053] [] kthread+0xd1/0xe0 [489191.320054] [] ret_from_fork_nospec_begin+0xe/0x21 [489191.326629] [] 0xffffffffffffffff [489191.331739] LustreError: dumping log to /tmp/lustre-log.1550006479.54678 [489278.075988] Lustre: fir-MDT0002: Client e9560223-f857-8af8-8e66-18924c1e4b0e (at 10.8.3.22@o2ib6) reconnecting [489278.086143] Lustre: Skipped 77 previous similar messages [489278.091647] Lustre: fir-MDT0002: Connection restored to 6e1d62f3-a431-7ead-00c3-acc48801def9 (at 10.8.3.22@o2ib6) [489278.102094] Lustre: Skipped 82 previous similar messages [489290.727417] Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [489305.884797] Pid: 54709, comm: mdt03_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [489305.894632] Call Trace: [489305.897188] [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] [489305.904223] [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] [489305.911416] [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] [489305.918177] [] osp_md_object_lock+0x162/0x2d0 [osp] [489305.924832] [] lod_object_lock+0xf3/0x7b0 [lod] [489305.931141] [] mdd_object_lock+0x3e/0xe0 [mdd] [489305.937365] [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] [489305.944714] [] mdt_remote_object_lock+0x2a/0x30 [mdt] [489305.951558] [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] [489305.959689] [] mdt_reint_rename+0x13/0x20 [mdt] [489305.966000] [] mdt_reint_rec+0x83/0x210 [mdt] [489305.972137] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [489305.978803] [] mdt_reint+0x67/0x140 [mdt] [489305.984582] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [489305.991642] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [489305.999455] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [489306.005881] [] kthread+0xd1/0xe0 [489306.010885] [] ret_from_fork_nospec_begin+0xe/0x21 [489306.017461] [] 0xffffffffffffffff [489306.022571] LustreError: dumping log to /tmp/lustre-log.1550006594.54709 [489393.423173] LustreError: 11-0: fir-MDT0003-osp-MDT0002: operation mds_statfs to node 10.0.10.52@o2ib7 failed: rc = -107 [489393.434043] LustreError: Skipped 83 previous similar messages [489399.781164] Lustre: Failing over fir-MDT0002 [489399.813558] Lustre: fir-MDT0002: Not available for connect from 10.8.7.15@o2ib6 (stopping) [489399.821915] Lustre: Skipped 7 previous similar messages [489399.842199] LustreError: 10984:0:(osp_dev.c:485:osp_disconnect()) fir-MDT0002-osp-MDT0000: can't disconnect: rc = -19 [489399.844714] LNet: Service thread pid 54713 completed after 2933.29s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [489399.844964] LustreError: 54720:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8962f3746800 ns: mdt-fir-MDT0002_UUID lock: ffff8970caee0480/0x1a4b7ac772e3b1c2 lrc: 3/0,0 mode: PR/PR res: [0x2c0003303:0x338:0x0].0x0 bits 0x1b/0x0 rrc: 12 type: IBT flags: 0x50200000000000 nid: 10.9.112.11@o2ib4 remote: 0xfb553d26d8599915 expref: 419 pid: 54720 timeout: 0 lvb_type: 0 [489399.860076] Lustre: 54657:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61910:183503s); client may timeout. req@ffff8997bc2db600 x1624836094636288/t30146813153(0) o36->bef16258-699d-0e14-bdeb-b454fac00d89@10.9.112.15@o2ib4:555/0 lens 504/424 e 24 to 0 dl 1549823185 ref 1 fl Complete:/0/0 rc -19/-19 [489399.860079] Lustre: 54657:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 37 previous similar messages [489399.868125] LustreError: 56089:0:(mdt_reint.c:2603:mdt_reint_rename_or_migrate()) fir-MDT0002: can't lock FS for rename: rc = -5 [489399.868128] LustreError: 56137:0:(mdt_reint.c:2603:mdt_reint_rename_or_migrate()) fir-MDT0002: can't lock FS for rename: rc = -5 [489399.868130] LustreError: 56089:0:(mdt_reint.c:2603:mdt_reint_rename_or_migrate()) Skipped 1 previous similar message [489399.868131] LustreError: 56137:0:(mdt_reint.c:2603:mdt_reint_rename_or_migrate()) Skipped 1 previous similar message [489399.869856] LustreError: 10983:0:(ldlm_resource.c:1146:ldlm_resource_complain()) fir-MDT0000-osp-MDT0002: namespace resource [0x200000004:0x1:0x0].0x0 (ffff896ef8a9d5c0) refcount nonzero (57) after lock cleanup; forcing cleanup. [489399.870771] LustreError: 56371:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.82@o2ib6 arrived at 1550006688 with bad export cookie 1894743047037876449 [489399.870774] LustreError: 56371:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1085 previous similar messages [489399.870786] LustreError: 56371:0:(ldlm_lock.c:2689:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0000_UUID lock: ffff897d41e74a40/0x1a4b7ac772df4d62 lrc: 3/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0x0 bits 0x13/0x0 rrc: 977 type: IBT flags: 0x40200000000000 nid: 10.8.0.82@o2ib6 remote: 0x6ea4d8d5c782eddd expref: 3 pid: 54700 timeout: 0 lvb_type: 0 [489400.071113] LustreError: 10984:0:(lod_dev.c:265:lod_sub_process_config()) fir-MDT0000-mdtlov: error cleaning up LOD index 2: cmd 0xcf031: rc = -19 [489400.130776] LustreError: 10984:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff897c81f66300 x1624928643118944/t0(0) o1000->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 304/4320 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 [489400.153027] LustreError: 10984:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 63494 previous similar messages [489400.186199] LustreError: 10984:0:(osp_object.c:594:osp_attr_get()) fir-MDT0001-osp-MDT0000:osp_attr_get update error [0x20000000a:0x1:0x0]: rc = -5 [489400.226192] LustreError: 10984:0:(llog_cat.c:424:llog_cat_close()) fir-MDT0001-osp-MDT0000: failure destroying log during cleanup: rc = -5 [489400.332784] Lustre: fir-MDT0002: Not available for connect from 10.8.7.30@o2ib6 (stopping) [489400.341140] Lustre: Skipped 166 previous similar messages [489400.501616] LustreError: 56359:0:(ldlm_lock.c:2689:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0000_UUID lock: ffff895b55b2c380/0x1a4b7ac74fe81ebf lrc: 3/0,0 mode: PR/PR res: [0x200003797:0xd4a:0x0].0x0 bits 0x20/0x0 rrc: 5 type: IBT flags: 0x40200000000000 nid: 10.9.107.25@o2ib4 remote: 0xd0588608d472dfaf expref: 1483 pid: 56064 timeout: 0 lvb_type: 0 [489400.533213] LustreError: 56359:0:(ldlm_lock.c:2689:ldlm_lock_dump_handle()) Skipped 3 previous similar messages [489400.736703] LustreError: 54212:0:(osp_object.c:594:osp_attr_get()) fir-MDT0001-osp-MDT0000:osp_attr_get update error [0x240000406:0x138:0x0]: rc = -5 [489400.736992] LNet: Service thread pid 54701 completed after 2763.10s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [489400.736994] LNet: Skipped 69 previous similar messages [489400.771740] LustreError: 54212:0:(osp_object.c:594:osp_attr_get()) Skipped 5 previous similar messages [489401.047714] LustreError: 56359:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.113.7@o2ib4 arrived at 1550006689 with bad export cookie 1894743047037870625 [489401.063274] LustreError: 56359:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 9 previous similar messages [489401.333517] Lustre: fir-MDT0000: Not available for connect from 10.9.102.62@o2ib4 (stopping) [489401.342050] Lustre: Skipped 144 previous similar messages [489401.784771] LustreError: 56417:0:(ldlm_lock.c:2689:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0002_UUID lock: ffff8976c86fa1c0/0x1a4b7ac76b7155a0 lrc: 3/0,0 mode: PR/PR res: [0x2c00016c4:0xeac:0x0].0x0 bits 0x40/0x0 rrc: 67762 type: IBT flags: 0x40000000000000 nid: 10.9.103.11@o2ib4 remote: 0x2e5f1e62a350ed14 expref: 67770 pid: 63819 timeout: 0 lvb_type: 0 [489401.816803] LustreError: 56417:0:(ldlm_lock.c:2689:ldlm_lock_dump_handle()) Skipped 9 previous similar messages [489403.186293] LustreError: 56371:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.102.21@o2ib4 arrived at 1550006691 with bad export cookie 1894743047037862890 [489403.201937] LustreError: 56371:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 11 previous similar messages [489403.347979] Lustre: fir-MDT0002: Not available for connect from 10.9.101.57@o2ib4 (stopping) [489403.356510] Lustre: Skipped 248 previous similar messages [489403.964373] LustreError: 56359:0:(ldlm_lock.c:2689:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0002_UUID lock: ffff896363aacec0/0x1a4b7ac76e1e79c1 lrc: 3/0,0 mode: PR/PR res: [0x2c00016a7:0x8d:0x0].0x0 bits 0x40/0x0 rrc: 106960 type: IBT flags: 0x40000000000000 nid: 10.9.104.71@o2ib4 remote: 0x965623ab2d947206 expref: 106966 pid: 56092 timeout: 0 lvb_type: 0 [489403.996484] LustreError: 56359:0:(ldlm_lock.c:2689:ldlm_lock_dump_handle()) Skipped 13 previous similar messages [489405.036333] LNet: Service thread pid 63791 completed after 2902.76s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [489405.052679] LNet: Skipped 6 previous similar messages [489407.295088] LustreError: 56417:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.105.19@o2ib4 arrived at 1550006695 with bad export cookie 1894743047037865305 [489407.310736] LustreError: 56417:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 31 previous similar messages [489407.350746] Lustre: fir-MDT0000: Not available for connect from 10.8.7.14@o2ib6 (stopping) [489407.359106] Lustre: Skipped 534 previous similar messages [489407.484130] LustreError: 10983:0:(osp_object.c:594:osp_attr_get()) fir-MDT0003-osp-MDT0002:osp_attr_get update error [0x20000000a:0x3:0x0]: rc = -5 [489407.515421] LustreError: 10983:0:(llog_cat.c:424:llog_cat_close()) fir-MDT0003-osp-MDT0002: failure destroying log during cleanup: rc = -5 [489407.527934] LustreError: 10983:0:(llog_cat.c:424:llog_cat_close()) Skipped 5 previous similar messages [489409.218003] LustreError: 56371:0:(ldlm_lock.c:2689:ldlm_lock_dump_handle()) ### ### ns: mdt-fir-MDT0002_UUID lock: ffff896cff61ee40/0x1a4b7ac76dca1425 lrc: 3/0,0 mode: PR/PR res: [0x2c0002b4b:0xc9:0x0].0x0 bits 0x40/0x0 rrc: 81532 type: IBT flags: 0x40000000000000 nid: 10.9.105.12@o2ib4 remote: 0x2dcb7cb2ac6d2c25 expref: 81544 pid: 54710 timeout: 0 lvb_type: 0 [489409.249933] LustreError: 56371:0:(ldlm_lock.c:2689:ldlm_lock_dump_handle()) Skipped 25 previous similar messages [489411.842199] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.22.13@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [489411.859569] LustreError: Skipped 1 previous similar message [489412.353772] Lustre: server umount fir-MDT0000 complete [489412.401967] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.101.38@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [489412.419424] LustreError: Skipped 21 previous similar messages [489413.411526] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.101.4@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [489413.428894] LustreError: Skipped 71 previous similar messages [489415.354512] Lustre: fir-MDT0002: Not available for connect from 10.8.22.18@o2ib6 (stopping) [489415.362949] Lustre: Skipped 614 previous similar messages [489415.412688] LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.2.1@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [489415.429882] LustreError: Skipped 128 previous similar messages [489417.747505] Lustre: server umount fir-MDT0002 complete [489423.683702] LustreError: 54045:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.107.26@o2ib4 arrived at 1550006711 with bad export cookie 1894743047037883211 [489423.699356] LustreError: 54045:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 27 previous similar messages [489439.789383] LustreError: 56423:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.27.24@o2ib6 arrived at 1550006728 with bad export cookie 1894743047037882476 [489439.804957] LustreError: 56423:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 3 previous similar messages [489440.070280] LDISKFS-fs (dm-0): file extents enabled, maximum tree depth=5 [489440.122267] LDISKFS-fs (dm-4): file extents enabled, maximum tree depth=5 [489440.292755] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [489440.319671] LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [489440.733554] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.8.8@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. [489440.750753] LustreError: Skipped 159 previous similar messages [489441.070800] Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [489441.208479] Lustre: fir-MDD0002: changelog on [489441.217479] Lustre: fir-MDT0002: in recovery but waiting for the first client to connect [489441.422019] LustreError: 11-0: fir-MDT0002-osp-MDT0000: operation mds_connect to node 0@lo failed: rc = -114 [489441.431935] LustreError: Skipped 5 previous similar messages [489441.740547] Lustre: fir-MDD0000: changelog on [489441.749465] Lustre: fir-MDT0000: in recovery but waiting for the first client to connect [489441.785519] Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1347 clients reconnect [489442.524788] Lustre: fir-MDT0000: Will be in recovery for at least 2:30, or until 1347 clients reconnect [489446.123616] LustreError: 11453:0:(mdt_open.c:1364:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff89979cb91b00 x1624702413102768/t0(30601294436) o101->fe789eb3-1cd9-3594-b889-6606ba1b8e4a@10.9.113.2@o2ib4:39/0 lens 1784/3288 e 0 to 0 dl 1550007489 ref 1 fl Interpret:/4/0 rc 0/0 [489448.877398] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.103.36@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. [489448.894851] LustreError: Skipped 53 previous similar messages [489466.529003] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [489466.545333] LustreError: Skipped 4 previous similar messages [489491.617621] Lustre: 11887:0:(ldlm_lib.c:1771:extend_recovery_timer()) fir-MDT0000: extended recovery timer reaching hard limit: 900, extend: 1 [489491.630484] Lustre: 11887:0:(ldlm_lib.c:1771:extend_recovery_timer()) Skipped 2 previous similar messages [489492.713037] Lustre: fir-MDT0000: Recovery over after 0:50, of 1349 clients 1349 recovered and 0 were evicted. [489492.723037] Lustre: Skipped 1 previous similar message [489669.969943] Lustre: 12197:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550006951/real 1550006951] req@ffff8997f7e87200 x1624928644857536/t0(0) o104->fir-MDT0002@10.8.15.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1550006958 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [489669.997198] Lustre: 12197:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [489691.007470] Lustre: 12197:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550006972/real 1550006972] req@ffff8997f7e87200 x1624928644857536/t0(0) o104->fir-MDT0002@10.8.15.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1550006979 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [489691.034719] Lustre: 12197:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [489724.795018] Lustre: MGS: haven't heard from client f302bc6f-3526-3af0-99f2-05804f8fcca5 (at 10.8.15.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89959db92800, cur 1550007013 expire 1550006863 last 1550006786 [489724.816373] Lustre: Skipped 2 previous similar messages [489726.047349] Lustre: 12197:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550007007/real 1550007007] req@ffff8997f7e87200 x1624928644857536/t0(0) o104->fir-MDT0002@10.8.15.7@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1550007014 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 [489726.074605] Lustre: 12197:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [489740.790140] Lustre: fir-MDT0000: haven't heard from client 35e43627-4559-6e68-aa0d-2f084505361b (at 10.8.15.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899402319400, cur 1550007029 expire 1550006879 last 1550006802 [489743.789806] Lustre: fir-MDT0002: haven't heard from client 35e43627-4559-6e68-aa0d-2f084505361b (at 10.8.15.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8983a0ba8800, cur 1550007032 expire 1550006882 last 1550006805 [489743.826365] LustreError: 12156:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8962f3743000 x1624928645651312/t0(0) o104->fir-MDT0002@10.8.15.7@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [489743.847451] LustreError: 12156:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 24 previous similar messages [490734.386264] Lustre: 11937:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [491093.861750] Lustre: 12290:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [491788.068753] Lustre: 12290:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 [491914.844335] Lustre: fir-MDT0002: haven't heard from client 3ba4f6ab-da5b-8d5b-7e10-cb73b415cd02 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897eb4a5b400, cur 1550009203 expire 1550009053 last 1550008976 [492039.430805] Lustre: MGS: Connection restored to ebca69ce-60cf-b682-0b00-8cb081d19aed (at 10.8.3.11@o2ib6) [492039.440472] Lustre: Skipped 2810 previous similar messages [492159.982945] Lustre: 12172:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 [492435.728271] Lustre: 12266:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [492446.903590] Lustre: 12318:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 [492446.915327] Lustre: 12318:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages [492607.567958] Lustre: 12265:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 [492607.579696] Lustre: 12265:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message [493274.139887] Lustre: 12420:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 [493274.151630] Lustre: 12420:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages [493605.497819] Lustre: 12336:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 [493605.509583] Lustre: 12336:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message [493634.668485] LustreError: 11493:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c0003374:0x2c:0x0] [493844.290941] Lustre: 12285:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 [494725.443400] Lustre: 12294:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550012006/real 1550012006] req@ffff896c55762a00 x1624928883014976/t0(0) o104->fir-MDT0002@10.8.19.6@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1550012013 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 [494725.470766] Lustre: 12294:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages [494725.486342] Lustre: fir-MDT0000: Client ed540f5a-5df6-5998-8f5c-40181564f690 (at 10.9.106.50@o2ib4) reconnecting [494725.496599] Lustre: Skipped 3 previous similar messages [494725.499670] Lustre: fir-MDT0002: Connection restored to cdeb9a20-d3dc-d919-0f35-34da2b8a5abc (at 10.9.101.41@o2ib4) [494725.499672] Lustre: Skipped 2 previous similar messages [494726.075704] Lustre: 56353:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff8985a1bcd700 x1624732778452928/t0(0) o103->4e48592f-b97d-5c93-9da4-86c872d7a486@10.9.107.43@o2ib4:31/0 lens 328/0 e 0 to 0 dl 1550012011 ref 2 fl Interpret:H/0/ffffffff rc 0/-1 [494726.104687] Lustre: 56353:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message [494726.126388] Lustre: MGS: Received new LWP connection from 10.8.20.17@o2ib6, removing former export from same NID [494726.136679] Lustre: Skipped 3 previous similar messages [494726.231817] LustreError: 13225:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5c5a1c9add490 vs. last_xid 5c5a1c9add79f req@ffff8986736d3600 x1624673547572368/t0(0) o35->37b4854b-e93e-85a5-e644-9d0c6be8cc09@10.8.2.29@o2ib6:40/0 lens 392/0 e 0 to 0 dl 1550012020 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [494726.331985] Lustre: fir-MDT0003-osp-MDT0002: Connection to fir-MDT0003 (at 10.0.10.52@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [494726.348075] Lustre: Skipped 4 previous similar messages [494726.600185] LustreError: 12369:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff89674e297450 x1624736209161968/t0(0) o4->efc6b332-a736-88e8-194a-588aa3e05348@10.8.21.36@o2ib6:62/0 lens 488/448 e 1 to 0 dl 1550012042 ref 1 fl Interpret:/0/0 rc 0/0 [494726.624254] Lustre: fir-MDT0002: Bulk IO write error with efc6b332-a736-88e8-194a-588aa3e05348 (at 10.8.21.36@o2ib6), client will retry: rc = -110 [494727.286763] LustreError: 12237:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8973867c9450 x1624736209162032/t0(0) o4->efc6b332-a736-88e8-194a-588aa3e05348@10.8.21.36@o2ib6:57/0 lens 488/448 e 1 to 0 dl 1550012037 ref 1 fl Interpret:/0/0 rc 0/0 [494727.310839] LustreError: 12237:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages [494727.849830] LustreError: 12166:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5c5aaeeb12940 vs. last_xid 5c5aaeeb1294f req@ffff89833ce9d400 x1624712823253312/t0(0) o101->9ae76257-9e30-ddcb-f15b-a8db6da186f5@10.8.8.6@o2ib6:41/0 lens 1768/0 e 0 to 0 dl 1550012021 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [494730.178284] LustreError: 12250:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff897491b5a050 x1624748418306000/t0(0) o4->fbb7e0da-3603-8dfb-de71-fd8cea5618ef@10.9.106.69@o2ib4:57/0 lens 488/448 e 1 to 0 dl 1550012037 ref 1 fl Interpret:/0/0 rc 0/0 [494730.202432] LustreError: 12250:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 2 previous similar messages [494730.211941] Lustre: fir-MDT0002: Bulk IO write error with fbb7e0da-3603-8dfb-de71-fd8cea5618ef (at 10.9.106.69@o2ib4), client will retry: rc = -110 [494730.225233] Lustre: Skipped 5 previous similar messages [494734.930348] Lustre: 11926:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550012014/real 1550012014] req@ffff896cb3ee0900 x1624928883014896/t0(0) o106->fir-MDT0002@10.9.107.17@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1550012021 ref 1 fl Rpc:RX/2/ffffffff rc -11/-1 [494734.958050] Lustre: 11926:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages [494735.020363] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [494735.030622] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message [494735.040789] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.202@o2ib7 (6): c: 0, oc: 0, rc: 8 [494735.052859] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message [494735.063765] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.212@o2ib7: 1 seconds [494735.074016] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 7 previous similar messages [494735.415420] Lustre: 56360:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-12s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8986736d0c00 x1624732778453520/t0(0) o103->4e48592f-b97d-5c93-9da4-86c872d7a486@10.9.107.43@o2ib4:31/0 lens 328/0 e 0 to 0 dl 1550012011 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 [494735.447091] Lustre: 56360:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 441 previous similar messages [494735.470760] LustreError: 12360:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff89805a6e5050 x1625285072210656/t0(0) o3->c3ee8e29-24b2-60ad-b950-c5ea318742ba@10.8.17.29@o2ib6:61/0 lens 488/440 e 1 to 0 dl 1550012041 ref 1 fl Interpret:/0/0 rc 0/0 [494735.494738] Lustre: fir-MDT0000: Bulk IO read error with c3ee8e29-24b2-60ad-b950-c5ea318742ba (at 10.8.17.29@o2ib6), client will retry: rc -110 [494735.507725] Lustre: fir-MDT0002: Bulk IO write error with ddfc24dc-ab35-b5d1-5ce6-6e97aa901210 (at 10.9.107.16@o2ib4), client will retry: rc = -110 [494735.521157] Lustre: fir-MDT0000: Connection restored to 02d67c93-5c7d-2116-5b9c-eaf6bd66e06a (at 10.8.29.5@o2ib6) [494735.531510] Lustre: Skipped 59 previous similar messages [494738.859223] Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). [494738.868621] Lustre: 56361:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=6, svcEst=32, delay=10018 [494738.879199] Lustre: 56361:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff89676baa2850 x1624732778440336/t0(0) o103->4e48592f-b97d-5c93-9da4-86c872d7a486@10.9.107.43@o2ib4:41/0 lens 496/224 e 1 to 0 dl 1550012021 ref 2 fl Interpret:H/0/0 rc 0/0 [494738.910282] Lustre: 56361:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages [494739.023365] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [494739.033622] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 1 previous similar message [494739.043790] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.105@o2ib7 (11): c: 0, oc: 0, rc: 8 [494739.055945] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 1 previous similar message [494740.712331] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [494740.722591] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.212@o2ib7 (5): c: 5, oc: 0, rc: 8 [494745.542290] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [ldlm_cn03_016:8536] [494745.550379] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [494745.623380] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [494745.656802] CPU: 11 PID: 8536 Comm: ldlm_cn03_016 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494745.662292] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [ldlm_cn01_023:56381] [494745.662343] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [494745.662362] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [494745.662366] CPU: 25 PID: 56381 Comm: ldlm_cn01_023 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494745.662366] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494745.662368] task: ffff8997deaab0c0 ti: ffff89966e7b4000 task.ti: ffff89966e7b4000 [494745.662376] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [494745.662377] RSP: 0018:ffff89966e7b7c38 EFLAGS: 00000246 [494745.662378] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000c90000 [494745.662379] RDX: ffff89983f6db780 RSI: 0000000001790001 RDI: ffff8964b4b5cb5c [494745.662380] RBP: ffff89966e7b7c38 R08: ffff8977ff79b780 R09: 0000000000000000 [494745.662380] R10: ffff89593fc07600 R11: ffffc45231e7d800 R12: ffff89966e7b7be0 [494745.662381] R13: ffff8980de370fc0 R14: ffff89966e7b7ba0 R15: ffffffffc0c6a378 [494745.662383] FS: 00007f37fbb54700(0000) GS:ffff8977ff780000(0000) knlGS:0000000000000000 [494745.662384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494745.662385] CR2: 00007f76743a2000 CR3: 000000203b27e000 CR4: 00000000003407e0 [494745.662386] Call Trace: [494745.662392] [] queued_spin_lock_slowpath+0xb/0xf [494745.662396] [] _raw_spin_lock+0x20/0x30 [494745.662432] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494745.662464] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494745.662500] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494745.662535] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494745.662569] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494745.662610] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494745.662649] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494745.662652] [] ? wake_up_state+0x20/0x20 [494745.662690] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494745.662728] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494745.662731] [] kthread+0xd1/0xe0 [494745.662733] [] ? insert_kthread_work+0x40/0x40 [494745.662736] [] ret_from_fork_nospec_begin+0xe/0x21 [494745.662737] [] ? insert_kthread_work+0x40/0x40 [494745.662757] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [494745.759294] NMI watchdog: BUG: soft lockup - CPU#36 stuck for 21s! [ldlm_cn00_018:56399] [494745.759342] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [494745.759358] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [494745.759361] CPU: 36 PID: 56399 Comm: ldlm_cn00_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494745.759362] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494745.759363] task: ffff89976bf95140 ti: ffff8997b3b9c000 task.ti: ffff8997b3b9c000 [494745.759372] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [494745.759373] RSP: 0018:ffff8997b3b9fbb8 EFLAGS: 00000246 [494745.759374] RAX: 0000000000000000 RBX: ffff8997b3b9fbf0 RCX: 0000000001210000 [494745.759375] RDX: ffff8977ff79b780 RSI: 0000000000c90000 RDI: ffff8964b4b5cb5c [494745.759376] RBP: ffff8997b3b9fbb8 R08: ffff8967ff05b780 R09: 0000000000000000 [494745.759376] R10: 0000000000000096 R11: 000000005c634e42 R12: 0000000000000000 [494745.759377] R13: 0000000000000000 R14: 0000000000000002 R15: ffff896baa2172c0 [494745.759379] FS: 000000000179c880(0000) GS:ffff8967ff040000(0000) knlGS:0000000000000000 [494745.759379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494745.759380] CR2: 0000000000449710 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494745.759382] Call Trace: [494745.759388] [] queued_spin_lock_slowpath+0xb/0xf [494745.759392] [] _raw_spin_lock+0x20/0x30 [494745.759437] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494745.759470] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [494745.759473] [] ? native_queued_spin_lock_slowpath+0x126/0x200 [494745.759504] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [494745.759540] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494745.759575] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494745.759609] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494745.759651] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494745.759689] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494745.759694] [] ? wake_up_state+0x20/0x20 [494745.759732] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494745.759769] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494745.759774] [] kthread+0xd1/0xe0 [494745.759775] [] ? insert_kthread_work+0x40/0x40 [494745.759779] [] ret_from_fork_nospec_begin+0xe/0x21 [494745.759781] [] ? insert_kthread_work+0x40/0x40 [494745.759800] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [494745.829296] NMI watchdog: BUG: soft lockup - CPU#44 stuck for 22s! [ldlm_cn00_026:56416] [494745.829323] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [494745.829333] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [494745.829336] CPU: 44 PID: 56416 Comm: ldlm_cn00_026 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494745.829336] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494745.829338] task: ffff8997fc6e1040 ti: ffff8997fc6ec000 task.ti: ffff8997fc6ec000 [494745.829342] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [494745.829343] RSP: 0018:ffff8997fc6efc38 EFLAGS: 00000246 [494745.829344] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000001610000 [494745.829344] RDX: ffff89983f61b780 RSI: 0000000001190001 RDI: ffff8964b4b5cb5c [494745.829345] RBP: ffff8997fc6efc38 R08: ffff8967ff0db780 R09: 0000000000000000 [494745.829346] R10: ffff89593fc07600 R11: 0000000000000001 R12: ffff8997fc6efbe0 [494745.829347] R13: ffff8980de370fc0 R14: ffff8997fc6efba0 R15: ffffffffc0c6a378 [494745.829348] FS: 00007f147a446740(0000) GS:ffff8967ff0c0000(0000) knlGS:0000000000000000 [494745.829349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494745.829350] CR2: 00007ffe631f1c40 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494745.829351] Call Trace: [494745.829354] [] queued_spin_lock_slowpath+0xb/0xf [494745.829356] [] _raw_spin_lock+0x20/0x30 [494745.829387] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494745.829419] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494745.829453] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494745.829487] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494745.829521] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494745.829560] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494745.829598] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494745.829600] [] ? wake_up_state+0x20/0x20 [494745.829637] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494745.829674] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494745.829676] [] kthread+0xd1/0xe0 [494745.829678] [] ? insert_kthread_work+0x40/0x40 [494745.829680] [] ret_from_fork_nospec_begin+0xe/0x21 [494745.829681] [] ? insert_kthread_work+0x40/0x40 [494745.829700] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [494746.696611] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494746.704264] task: ffff8996af340000 ti: ffff899634e1c000 task.ti: ffff899634e1c000 [494746.711828] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [494746.721857] RSP: 0018:ffff899634e1fc38 EFLAGS: 00000246 [494746.727256] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000590000 [494746.734474] RDX: ffff8977ff81b780 RSI: 0000000001090001 RDI: ffff8964b4b5cb5c [494746.741694] RBP: ffff899634e1fc38 R08: ffff89983f49b780 R09: 0000000000000000 [494746.748913] R10: ffff89593fc07600 R11: ffffc45231e7d600 R12: ffff899634e1fbe0 [494746.756133] R13: ffff8980de370fc0 R14: ffff899634e1fba0 R15: ffffffffc0c6a378 [494746.763355] FS: 00007fd7a82ab740(0000) GS:ffff89983f480000(0000) knlGS:0000000000000000 [494746.771527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494746.777358] CR2: 0000000002d5bba8 CR3: 0000000cc13a8000 CR4: 00000000003407e0 [494746.784579] Call Trace: [494746.787122] [] queued_spin_lock_slowpath+0xb/0xf [494746.793475] [] _raw_spin_lock+0x20/0x30 [494746.799086] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494746.805730] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494746.812374] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494746.819366] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494746.826187] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494746.833191] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494746.840968] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494746.847842] [] ? wake_up_state+0x20/0x20 [494746.853534] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494746.859921] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494746.867401] [] kthread+0xd1/0xe0 [494746.872367] [] ? insert_kthread_work+0x40/0x40 [494746.878546] [] ret_from_fork_nospec_begin+0xe/0x21 [494746.885072] [] ? insert_kthread_work+0x40/0x40 [494746.891251] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [494747.600221] LustreError: 14095:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff89805a6e5850 x1624705179304384/t0(0) o4->bd33c9b1-bb5e-c24e-0675-22654fcc67c5@10.8.24.26@o2ib6:65/0 lens 488/448 e 1 to 0 dl 1550012045 ref 1 fl Interpret:/0/0 rc 0/0 [494747.624276] LustreError: 14095:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message [494747.633724] Lustre: fir-MDT0002: Bulk IO write error with bd33c9b1-bb5e-c24e-0675-22654fcc67c5 (at 10.8.24.26@o2ib6), client will retry: rc = -110 [494749.933332] Lustre: 14074:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 14s req@ffff896c823c8050 x1624748426994560/t0(0) o4->ff962631-3204-bbf1-cb07-efdb4b779a1a@10.9.106.30@o2ib4:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [494749.957290] Lustre: 14074:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 165 previous similar messages [494752.611289] Lustre: 11489:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 16s req@ffff896cb1f37c50 x1624736839009520/t0(0) o4->9f1de1b3-7e56-c812-677b-e9a4e7cdbca5@10.8.8.5@o2ib6:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [494752.636099] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [494752.646360] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.209@o2ib7 (0): c: 0, oc: 0, rc: 8 [494752.659477] Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). [494752.667658] Lustre: 12186:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550012032/real 1550012032] req@ffff897f86a05100 x1624928883026736/t0(0) o104->fir-MDT0002@10.9.106.29@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1550012039 ref 1 fl Rpc:RX/2/ffffffff rc -11/-1 [494752.667661] Lustre: 12186:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 47 previous similar messages [494752.705932] Lustre: 12232:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=56 reqQ=54 recA=15, svcEst=33, delay=0 [494753.483485] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [ldlm_cn00_020:56404] [494753.551487] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [ldlm_cn00_012:56351] [494753.491577] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure [494753.551488] Modules linked in: [494753.551489] osp(OE) [494753.551489] mdd(OE) [494753.551490] mdt(OE) [494753.551490] lustre(OE) [494753.551491] mdc(OE) [494753.551491] lod(OE) [494753.551491] lfsck(OE) [494753.551492] mgs(OE) [494753.551492] mgc(OE) [494753.551492] osd_ldiskfs(OE) [494753.551493] lquota(OE) [494753.551493] ldiskfs(OE) [494753.551493] lmv(OE) [494753.551494] osc(OE) [494753.551494] lov(OE) [494753.551494] fid(OE) [494753.551495] fld(OE) [494753.551495] ko2iblnd(OE) [494753.551495] ptlrpc(OE) [494753.551496] obdclass(OE) [494753.551496] lnet(OE) [494753.551496] libcfs(OE) [494753.551497] rpcsec_gss_krb5 [494753.551497] auth_rpcgss [494753.551497] nfsv4 [494753.551498] dns_resolver [494753.551498] nfs [494753.551498] lockd [494753.551499] grace [494753.551499] fscache [494753.551500] rdma_ucm(OE) [494753.551500] ib_ucm(OE) [494753.551500] rdma_cm(OE) [494753.551501] iw_cm(OE) [494753.551501] ib_ipoib(OE) [494753.551501] ib_cm(OE) [494753.551502] ib_umad(OE) [494753.551502] mlx5_fpga_tools(OE) [494753.551502] mlx4_en(OE) [494753.551503] mlx4_ib(OE) [494753.551503] mlx4_core(OE) [494753.551504] dell_rbu [494753.551504] sunrpc [494753.551504] vfat [494753.551505] fat [494753.551505] dm_round_robin [494753.551505] dcdbas [494753.551506] amd64_edac_mod [494753.551506] edac_mce_amd [494753.551506] kvm_amd [494753.551507] kvm [494753.551507] irqbypass [494753.551507] crc32_pclmul [494753.551507] ghash_clmulni_intel [494753.551508] aesni_intel [494753.551508] lrw [494753.551508] gf128mul [494753.551509] glue_helper [494753.551509] ablk_helper [494753.551509] cryptd [494753.551510] ses [494753.551510] dm_multipath [494753.551510] ipmi_si [494753.551511] enclosure [494753.551511] pcspkr [494753.551512] dm_mod [494753.551513] sg [494753.551513] ipmi_devintf [494753.551514] ccp [494753.551514] i2c_piix4 [494753.551514] ipmi_msghandler [494753.551515] k10temp [494753.551515] acpi_power_meter [494753.551516] knem(OE) [494753.551516] ip_tables [494753.551517] ext4 [494753.551518] mbcache [494753.551518] jbd2 [494753.551519] sd_mod [494753.551519] crc_t10dif [494753.551520] crct10dif_generic [494753.551520] mlx5_ib(OE) [494753.551521] ib_uverbs(OE) [494753.551522] ib_core(OE) [494753.551522] i2c_algo_bit [494753.551523] drm_kms_helper [494753.551523] ahci [494753.551524] syscopyarea [494753.551524] mlx5_core(OE) [494753.551525] sysfillrect [494753.551525] sysimgblt [494753.551526] libahci [494753.551526] mlxfw(OE) [494753.551527] fb_sys_fops [494753.551527] devlink [494753.551527] ttm [494753.551528] crct10dif_pclmul [494753.551528] tg3 [494753.551529] crct10dif_common [494753.551529] mlx_compat(OE) [494753.551530] drm [494753.551530] megaraid_sas [494753.551531] crc32c_intel [494753.551531] libata [494753.551531] ptp [494753.551532] drm_panel_orientation_quirks [494753.551532] pps_core [494753.551533] mpt3sas(OE) [494753.551533] raid_class [494753.551534] scsi_transport_sas [494753.551534] [last unloaded: osp] [494753.551534] [494753.551537] CPU: 12 PID: 56351 Comm: ldlm_cn00_012 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494753.551538] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494753.551539] task: ffff8997dffd2080 ti: ffff89978a76c000 task.ti: ffff89978a76c000 [494753.551540] RIP: 0010:[] [494753.551545] [] native_queued_spin_lock_slowpath+0x126/0x200 [494753.551546] RSP: 0018:ffff89978a76fbb8 EFLAGS: 00000246 [494753.551547] RAX: 0000000000000000 RBX: ffff89978a76fbf0 RCX: 0000000000610000 [494753.551548] RDX: ffff8977ff69b780 RSI: 0000000000490000 RDI: ffff8964b4b5cb5c [494753.551548] RBP: ffff89978a76fbb8 R08: ffff8967feedb780 R09: 0000000000000000 [494753.551549] R10: ffff89593fc07600 R11: 0000000000000000 R12: 0000000000000000 [494753.551549] R13: 0000000000000000 R14: 0000000000000002 R15: ffff896c97be98c0 [494753.551551] FS: 00007fb564ea5700(0000) GS:ffff8967feec0000(0000) knlGS:0000000000000000 [494753.551551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494753.551552] CR2: 00007fc0390fe000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494753.551553] Call Trace: [494753.551557] [] queued_spin_lock_slowpath+0xb/0xf [494753.551560] [] _raw_spin_lock+0x20/0x30 [494753.551597] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494753.551624] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [494753.551626] [] ? native_queued_spin_lock_slowpath+0x122/0x200 [494753.551652] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [494753.551682] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494753.551710] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494753.551738] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494753.551773] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494753.551804] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494753.551808] [] ? wake_up_state+0x20/0x20 [494753.551839] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494753.551870] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494753.551874] [] kthread+0xd1/0xe0 [494753.551875] [] ? insert_kthread_work+0x40/0x40 [494753.551877] [] ret_from_fork_nospec_begin+0xe/0x21 [494753.551879] [] ? insert_kthread_work+0x40/0x40 [494753.551879] Code: [494753.551880] 0d [494753.551880] 48 [494753.551880] 98 [494753.551881] 83 [494753.551881] e2 [494753.551881] 30 [494753.551882] 48 [494753.551882] 81 [494753.551882] c2 [494753.551883] 80 [494753.551883] b7 [494753.551883] 01 [494753.551883] 00 [494753.551884] 48 [494753.551884] 03 [494753.551884] 14 [494753.551885] c5 [494753.551885] 60 [494753.551885] b9 [494753.551886] 34 [494753.551886] 9e [494753.551886] 4c [494753.551886] 89 [494753.551887] 02 [494753.551887] 41 [494753.551887] 8b [494753.551888] 40 [494753.551888] 08 [494753.551888] 85 [494753.551888] c0 [494753.551889] 75 [494753.551889] 0f [494753.551889] 0f [494753.551890] 1f [494753.551890] 44 [494753.551890] 00 [494753.551891] 00 [494753.551891] f3 [494753.551891] 90 [494753.551891] 41 [494753.551892] 8b [494753.551892] 40 [494753.551892] 08 [494753.551893] <85> [494753.551893] c0 [494753.551893] 74 [494753.551894] f6 [494753.551894] 4d [494753.551894] 8b [494753.551894] 08 [494753.551895] 4d [494753.551895] 85 [494753.551895] c9 [494753.551896] 74 [494753.551896] 04 [494753.551896] 41 [494753.551897] 0f [494753.551897] 18 [494753.551897] 09 [494753.551897] 8b [494753.551898] 17 [494753.551898] 0f [494753.551898] b7 [494753.551899] c2 [494753.551899] [494753.620488] NMI watchdog: BUG: soft lockup - CPU#20 stuck for 23s! [ldlm_cn00_016:56369] [494753.620489] Modules linked in: [494753.620490] osp(OE) [494753.620490] mdd(OE) [494753.620490] mdt(OE) [494753.620491] lustre(OE) [494753.620491] mdc(OE) [494753.620492] lod(OE) [494753.620492] lfsck(OE) [494753.620493] mgs(OE) [494753.620493] mgc(OE) [494753.620493] osd_ldiskfs(OE) [494753.620494] lquota(OE) [494753.620494] ldiskfs(OE) [494753.620495] lmv(OE) [494753.620495] osc(OE) [494753.620495] lov(OE) [494753.620496] fid(OE) [494753.620496] fld(OE) [494753.620497] ko2iblnd(OE) [494753.620497] ptlrpc(OE) [494753.620497] obdclass(OE) [494753.620498] lnet(OE) [494753.620498] libcfs(OE) [494753.620499] rpcsec_gss_krb5 [494753.620499] auth_rpcgss [494753.620499] nfsv4 [494753.620500] dns_resolver [494753.620500] nfs [494753.620500] lockd [494753.620501] grace [494753.620501] fscache [494753.620502] rdma_ucm(OE) [494753.620502] ib_ucm(OE) [494753.620502] rdma_cm(OE) [494753.620503] iw_cm(OE) [494753.620503] ib_ipoib(OE) [494753.620504] ib_cm(OE) [494753.620504] ib_umad(OE) [494753.620504] mlx5_fpga_tools(OE) [494753.620505] mlx4_en(OE) [494753.620505] mlx4_ib(OE) [494753.620506] mlx4_core(OE) [494753.620506] dell_rbu [494753.620506] sunrpc [494753.620507] vfat [494753.620507] fat [494753.620507] dm_round_robin [494753.620508] dcdbas [494753.620508] amd64_edac_mod [494753.620508] edac_mce_amd [494753.620509] kvm_amd [494753.620509] kvm [494753.620509] irqbypass [494753.620510] crc32_pclmul [494753.620510] ghash_clmulni_intel [494753.620511] aesni_intel [494753.620511] lrw [494753.620511] gf128mul [494753.620512] glue_helper [494753.620512] ablk_helper [494753.620512] cryptd [494753.620513] ses [494753.620513] dm_multipath [494753.620513] ipmi_si [494753.620514] enclosure [494753.620514] pcspkr [494753.620514] dm_mod [494753.620515] sg [494753.620515] ipmi_devintf [494753.620515] ccp [494753.620516] i2c_piix4 [494753.620516] ipmi_msghandler [494753.620516] k10temp [494753.620517] acpi_power_meter [494753.620517] knem(OE) [494753.620517] ip_tables [494753.620518] ext4 [494753.620518] mbcache [494753.620518] jbd2 [494753.620519] sd_mod [494753.620519] crc_t10dif [494753.620520] crct10dif_generic [494753.620520] mlx5_ib(OE) [494753.620520] ib_uverbs(OE) [494753.620521] ib_core(OE) [494753.620521] i2c_algo_bit [494753.620522] drm_kms_helper [494753.620522] ahci [494753.620522] syscopyarea [494753.620523] mlx5_core(OE) [494753.620523] sysfillrect [494753.620523] sysimgblt [494753.620524] libahci [494753.620524] mlxfw(OE) [494753.620524] fb_sys_fops [494753.620525] devlink [494753.620525] ttm [494753.620525] crct10dif_pclmul [494753.620526] tg3 [494753.620526] crct10dif_common [494753.620527] mlx_compat(OE) [494753.620527] drm [494753.620527] megaraid_sas [494753.620528] crc32c_intel [494753.620528] libata [494753.620528] ptp [494753.620529] drm_panel_orientation_quirks [494753.620529] pps_core [494753.620529] mpt3sas(OE) [494753.620530] raid_class [494753.620530] scsi_transport_sas [494753.620531] [last unloaded: osp] [494753.620531] [494753.620533] CPU: 20 PID: 56369 Comm: ldlm_cn00_016 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494753.620534] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494753.620535] task: ffff89978ff65140 ti: ffff8996882f8000 task.ti: ffff8996882f8000 [494753.620535] RIP: 0010:[] [494753.620539] [] native_queued_spin_lock_slowpath+0x122/0x200 [494753.620540] RSP: 0018:ffff8996882fbc38 EFLAGS: 00000246 [494753.620541] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000a10000 [494753.620541] RDX: ffff89983f41b780 RSI: 0000000000190001 RDI: ffff8964b4b5cb5c [494753.620542] RBP: ffff8996882fbc38 R08: ffff8967fef5b780 R09: 0000000000000000 [494753.620543] R10: ffff89593fc07600 R11: 0000000000000400 R12: ffff8996882fbbe0 [494753.620543] R13: ffff8980de370fc0 R14: ffff8996882fbba0 R15: ffffffffc0c6a378 [494753.620545] FS: 00007fb5676aa700(0000) GS:ffff8967fef40000(0000) knlGS:0000000000000000 [494753.620546] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494753.620546] CR2: 00007fb56a20c000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494753.620547] Call Trace: [494753.620550] [] queued_spin_lock_slowpath+0xb/0xf [494753.620552] [] _raw_spin_lock+0x20/0x30 [494753.620583] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494753.620615] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494753.620650] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494753.620685] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494753.620719] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494753.620757] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494753.620794] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494753.620796] [] ? default_wake_function+0x12/0x20 [494753.620799] [] ? __wake_up_common+0x5b/0x90 [494753.620836] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494753.620873] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494753.620875] [] kthread+0xd1/0xe0 [494753.620877] [] ? insert_kthread_work+0x40/0x40 [494753.620879] [] ret_from_fork_nospec_begin+0xe/0x21 [494753.620880] [] ? insert_kthread_work+0x40/0x40 [494753.620881] Code: [494753.620881] 13 [494753.620882] 48 [494753.620882] c1 [494753.620883] ea [494753.620883] 0d [494753.620883] 48 [494753.620884] 98 [494753.620884] 83 [494753.620884] e2 [494753.620885] 30 [494753.620885] 48 [494753.620885] 81 [494753.620886] c2 [494753.620886] 80 [494753.620886] b7 [494753.620887] 01 [494753.620887] 00 [494753.620887] 48 [494753.620888] 03 [494753.620888] 14 [494753.620888] c5 [494753.620889] 60 [494753.620889] b9 [494753.620889] 34 [494753.620890] 9e [494753.620890] 4c [494753.620890] 89 [494753.620891] 02 [494753.620891] 41 [494753.620891] 8b [494753.620892] 40 [494753.620892] 08 [494753.620892] 85 [494753.620893] c0 [494753.620893] 75 [494753.620893] 0f [494753.620894] 0f [494753.620894] 1f [494753.620894] 44 [494753.620895] 00 [494753.620895] 00 [494753.620895] f3 [494753.620896] 90 [494753.620896] <41> [494753.620897] 8b [494753.620897] 40 [494753.620897] 08 [494753.620898] 85 [494753.620898] c0 [494753.620898] 74 [494753.620899] f6 [494753.620899] 4d [494753.620899] 8b [494753.620900] 08 [494753.620900] 4d [494753.620900] 85 [494753.620901] c9 [494753.620901] 74 [494753.620901] 04 [494753.620902] 41 [494753.620902] 0f [494753.620902] 18 [494753.620903] 09 [494753.620903] 8b [494753.620903] [494753.689490] NMI watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [ldlm_cn00_024:56414] [494753.689490] Modules linked in: [494753.689491] osp(OE) [494753.689491] mdd(OE) [494753.689492] mdt(OE) [494753.689492] lustre(OE) [494753.689492] mdc(OE) [494753.689493] lod(OE) [494753.689493] lfsck(OE) [494753.689493] mgs(OE) [494753.689494] mgc(OE) [494753.689494] osd_ldiskfs(OE) [494753.689494] lquota(OE) [494753.689495] ldiskfs(OE) [494753.689495] lmv(OE) [494753.689495] osc(OE) [494753.689496] lov(OE) [494753.689496] fid(OE) [494753.689496] fld(OE) [494753.689497] ko2iblnd(OE) [494753.689497] ptlrpc(OE) [494753.689498] obdclass(OE) [494753.689498] lnet(OE) [494753.689498] libcfs(OE) [494753.689499] rpcsec_gss_krb5 [494753.689499] auth_rpcgss [494753.689499] nfsv4 [494753.689500] dns_resolver [494753.689500] nfs [494753.689500] lockd [494753.689501] grace [494753.689501] fscache [494753.689501] rdma_ucm(OE) [494753.689502] ib_ucm(OE) [494753.689502] rdma_cm(OE) [494753.689503] iw_cm(OE) [494753.689503] ib_ipoib(OE) [494753.689503] ib_cm(OE) [494753.689504] ib_umad(OE) [494753.689504] mlx5_fpga_tools(OE) [494753.689504] mlx4_en(OE) [494753.689505] mlx4_ib(OE) [494753.689505] mlx4_core(OE) [494753.689505] dell_rbu [494753.689506] sunrpc [494753.689506] vfat [494753.689506] fat [494753.689507] dm_round_robin [494753.689507] dcdbas [494753.689507] amd64_edac_mod [494753.689508] edac_mce_amd [494753.689508] kvm_amd [494753.689508] kvm [494753.689508] irqbypass [494753.689509] crc32_pclmul [494753.689509] ghash_clmulni_intel [494753.689509] aesni_intel [494753.689510] lrw [494753.689510] gf128mul [494753.689510] glue_helper [494753.689511] ablk_helper [494753.689511] cryptd [494753.689511] ses [494753.689511] dm_multipath [494753.689512] ipmi_si [494753.689512] enclosure [494753.689512] pcspkr [494753.689513] dm_mod [494753.689513] sg [494753.689513] ipmi_devintf [494753.689514] ccp [494753.689514] i2c_piix4 [494753.689514] ipmi_msghandler [494753.689515] k10temp [494753.689515] acpi_power_meter [494753.689515] knem(OE) [494753.689516] ip_tables [494753.689516] ext4 [494753.689516] mbcache [494753.689517] jbd2 [494753.689517] sd_mod [494753.689517] crc_t10dif [494753.689518] crct10dif_generic [494753.689518] mlx5_ib(OE) [494753.689518] ib_uverbs(OE) [494753.689519] ib_core(OE) [494753.689519] i2c_algo_bit [494753.689520] drm_kms_helper [494753.689520] ahci [494753.689520] syscopyarea [494753.689521] mlx5_core(OE) [494753.689521] sysfillrect [494753.689521] sysimgblt [494753.689522] libahci [494753.689522] mlxfw(OE) [494753.689522] fb_sys_fops [494753.689523] devlink [494753.689523] ttm [494753.689523] crct10dif_pclmul [494753.689524] tg3 [494753.689524] crct10dif_common [494753.689524] mlx_compat(OE) [494753.689525] drm [494753.689525] megaraid_sas [494753.689525] crc32c_intel [494753.689526] libata [494753.689526] ptp [494753.689526] drm_panel_orientation_quirks [494753.689527] pps_core [494753.689527] mpt3sas(OE) [494753.689527] raid_class [494753.689528] scsi_transport_sas [494753.689528] [last unloaded: osp] [494753.689528] [494753.689531] CPU: 28 PID: 56414 Comm: ldlm_cn00_024 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494753.689531] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494753.689533] task: ffff8996ddabe180 ti: ffff8997912f4000 task.ti: ffff8997912f4000 [494753.689533] RIP: 0010:[] [494753.689536] [] native_queued_spin_lock_slowpath+0x122/0x200 [494753.689537] RSP: 0018:ffff8997912f7bb8 EFLAGS: 00000246 [494753.689538] RAX: 0000000000000000 RBX: ffff8997912f7bf0 RCX: 0000000000e10000 [494753.689539] RDX: ffff8987ff61b780 RSI: 0000000000110000 RDI: ffff8964b4b5cb5c [494753.689539] RBP: ffff8997912f7bb8 R08: ffff8967fefdb780 R09: 0000000000000000 [494753.689540] R10: ffff89593fc07600 R11: 00000000000001c5 R12: 0000000000000000 [494753.689541] R13: 0000000000000000 R14: 0000000000000002 R15: ffff896761e63a80 [494753.689542] FS: 00007fc03fe86880(0000) GS:ffff8967fefc0000(0000) knlGS:0000000000000000 [494753.689543] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494753.689544] CR2: 00007fc039102000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494753.689545] Call Trace: [494753.689547] [] queued_spin_lock_slowpath+0xb/0xf [494753.689549] [] _raw_spin_lock+0x20/0x30 [494753.689581] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494753.689614] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [494753.689616] [] ? native_queued_spin_lock_slowpath+0x122/0x200 [494753.689648] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [494753.689683] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494753.689718] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494753.689752] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494753.689792] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494753.689830] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494753.689832] [] ? default_wake_function+0x12/0x20 [494753.689833] [] ? __wake_up_common+0x5b/0x90 [494753.689871] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494753.689909] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494753.689911] [] kthread+0xd1/0xe0 [494753.689912] [] ? insert_kthread_work+0x40/0x40 [494753.689914] [] ret_from_fork_nospec_begin+0xe/0x21 [494753.689916] [] ? insert_kthread_work+0x40/0x40 [494753.689916] Code: [494753.689917] 13 [494753.689917] 48 [494753.689918] c1 [494753.689918] ea [494753.689918] 0d [494753.689919] 48 [494753.689919] 98 [494753.689919] 83 [494753.689920] e2 [494753.689920] 30 [494753.689920] 48 [494753.689920] 81 [494753.689921] c2 [494753.689921] 80 [494753.689921] b7 [494753.689922] 01 [494753.689922] 00 [494753.689922] 48 [494753.689923] 03 [494753.689923] 14 [494753.689923] c5 [494753.689924] 60 [494753.689924] b9 [494753.689924] 34 [494753.689925] 9e [494753.689925] 4c [494753.689925] 89 [494753.689926] 02 [494753.689926] 41 [494753.689926] 8b [494753.689927] 40 [494753.689927] 08 [494753.689927] 85 [494753.689928] c0 [494753.689928] 75 [494753.689928] 0f [494753.689928] 0f [494753.689929] 1f [494753.689929] 44 [494753.689929] 00 [494753.689930] 00 [494753.689930] f3 [494753.689930] 90 [494753.689931] <41> [494753.689931] 8b [494753.689931] 40 [494753.689932] 08 [494753.689932] 85 [494753.689932] c0 [494753.689933] 74 [494753.689933] f6 [494753.689933] 4d [494753.689934] 8b [494753.689934] 08 [494753.689934] 4d [494753.689935] 85 [494753.689935] c9 [494753.689935] 74 [494753.689936] 04 [494753.689936] 41 [494753.689936] 0f [494753.689937] 18 [494753.689937] 09 [494753.689937] 8b [494753.689937] [494754.582548] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.209@o2ib7: 2 seconds [494754.582551] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 14 previous similar messages [494754.640485] pcspkr [494754.642703] dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [494754.688041] CPU: 4 PID: 56404 Comm: ldlm_cn00_020 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494754.700633] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494754.708285] task: ffff8997cb27b0c0 ti: ffff89969232c000 task.ti: ffff89969232c000 [494754.715851] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [494754.725871] RSP: 0018:ffff89969232fc38 EFLAGS: 00000246 [494754.731269] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000210000 [494754.738488] RDX: ffff89983f55b780 RSI: 0000000000b90001 RDI: ffff8964b4b5cb5c [494754.745708] RBP: ffff89969232fc38 R08: ffff8967fee5b780 R09: 0000000000000000 [494754.752929] R10: ffff89593fc07600 R11: 0000000000000400 R12: ffff89969232fbe0 [494754.760149] R13: ffff8980de370fc0 R14: ffff89969232fba0 R15: ffffffffc0c6a378 [494754.767369] FS: 00007fb56a1fd740(0000) GS:ffff8967fee40000(0000) knlGS:0000000000000000 [494754.775541] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494754.781374] CR2: 00007fc03fe97000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494754.788593] Call Trace: [494754.791133] [] queued_spin_lock_slowpath+0xb/0xf [494754.797487] [] _raw_spin_lock+0x20/0x30 [494754.803087] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494754.809727] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494754.816369] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494754.823364] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494754.830183] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494754.837181] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494754.844954] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494754.851827] [] ? default_wake_function+0x12/0x20 [494754.858179] [] ? __wake_up_common+0x5b/0x90 [494754.864133] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494754.870523] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494754.878002] [] kthread+0xd1/0xe0 [494754.882969] [] ? insert_kthread_work+0x40/0x40 [494754.889148] [] ret_from_fork_nospec_begin+0xe/0x21 [494754.895673] [] ? insert_kthread_work+0x40/0x40 [494754.901850] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [494755.200998] Lustre: 12353:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (32:2s); client may timeout. req@ffff896caf654450 x1624761333430736/t34426287616(0) o4->83e72c3d-872c-7a5f-f5c1-edf566d41d60@10.9.107.1@o2ib4:61/0 lens 488/416 e 1 to 0 dl 1550012041 ref 2 fl Complete:/0/0 rc 0/0 [494755.229976] Lustre: 12353:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 20 previous similar messages [494755.240313] Lustre: 12348:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 15s req@ffff896cb1f30450 x1624699442285104/t0(0) o4->82c763a5-8dc0-7fc4-d90b-e4497b03f725@10.9.104.57@o2ib4:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [494755.264267] Lustre: 12348:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 18 previous similar messages [494755.275807] Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). [494755.284682] Lustre: 12237:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=27 reqQ=30 recA=12, svcEst=56, delay=0 [494755.425490] Lustre: MGS: Connection restored to cdeb9a20-d3dc-d919-0f35-34da2b8a5abc (at 10.9.101.41@o2ib4) [494755.435314] Lustre: Skipped 361 previous similar messages [494756.590297] Lustre: 12237:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff896bfa66a450 x1624703100511456/t0(0) o4->797f0075-ec92-4d37-f23e-cc9ca768ea89@10.9.113.5@o2ib4:59/0 lens 1736/0 e 0 to 0 dl 1550012039 ref 2 fl New:/0/ffffffff rc 0/-1 [494757.474586] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [ldlm_cn03_012:105787] [494757.482764] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [494757.555764] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [494757.589188] CPU: 3 PID: 105787 Comm: ldlm_cn03_012 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494757.601880] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494757.609534] task: ffff8996b66d30c0 ti: ffff8997f9bec000 task.ti: ffff8997f9bec000 [494757.617099] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [494757.627119] RSP: 0018:ffff8997f9befbb8 EFLAGS: 00000246 [494757.632519] RAX: 0000000000000000 RBX: ffff8997f9befbf0 RCX: 0000000000190000 [494757.639737] RDX: ffff89983f6db780 RSI: 0000000001790000 RDI: ffff8964b4b5cb5c [494757.646959] RBP: ffff8997f9befbb8 R08: ffff89983f41b780 R09: 0000000000000000 [494757.654176] R10: ffff89593fc07600 R11: ffffc45235dbd200 R12: 0000000000000000 [494757.661397] R13: 0000000000000000 R14: 0000000000000002 R15: ffff896b47a54800 [494757.668616] FS: 00007f27b989e880(0000) GS:ffff89983f400000(0000) knlGS:0000000000000000 [494757.676789] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494757.682622] CR2: 0000000002066020 CR3: 00000030398cc000 CR4: 00000000003407e0 [494757.689842] Call Trace: [494757.692386] [] queued_spin_lock_slowpath+0xb/0xf [494757.698746] [] _raw_spin_lock+0x20/0x30 [494757.704360] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494757.711004] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [494757.717966] [] ? native_queued_spin_lock_slowpath+0x122/0x200 [494757.725476] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [494757.732157] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494757.739156] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494757.745975] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494757.752979] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494757.760757] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494757.767630] [] ? default_wake_function+0x12/0x20 [494757.773982] [] ? __wake_up_common+0x5b/0x90 [494757.779936] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494757.786324] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494757.793803] [] kthread+0xd1/0xe0 [494757.798769] [] ? insert_kthread_work+0x40/0x40 [494757.804948] [] ret_from_fork_nospec_begin+0xe/0x21 [494757.811474] [] ? insert_kthread_work+0x40/0x40 [494757.817650] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [494758.338262] LustreError: 12348:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.23.35@o2ib6: deadline 32:1s ago req@ffff896c9f7be850 x1624705154179136/t0(0) o4->153a9818-8d26-403e-7a91-27bec80982b8@10.8.23.35@o2ib6:65/0 lens 488/0 e 1 to 0 dl 1550012045 ref 2 fl Interpret:/0/ffffffff rc 0/-1 [494758.369928] LustreError: 12348:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 37 previous similar messages [494758.380631] Lustre: 12348:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (32:1s); client may timeout. req@ffff896c9f7be850 x1624705154179136/t0(0) o4->153a9818-8d26-403e-7a91-27bec80982b8@10.8.23.35@o2ib6:65/0 lens 488/0 e 1 to 0 dl 1550012045 ref 2 fl Interpret:/0/ffffffff rc 0/-1 [494758.409451] Lustre: 12348:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message [494758.427642] LustreError: 12237:0:(tgt_handler.c:2548:tgt_brw_write()) fir-MDT0002: Dropping timed-out write from 12345-10.8.13.5@o2ib6 because locking object 0x2c000434a:181 took 33 seconds (limit was 32). [494759.390650] LustreError: 14078:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8966d7694050 x1624701002717072/t0(0) o4->f88d3e4f-b8ad-7e3f-e052-b857e571de2a@10.9.107.13@o2ib4:93/0 lens 488/448 e 2 to 0 dl 1550012073 ref 1 fl Interpret:/0/0 rc 0/0 [494759.390653] LustreError: 14075:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff8966d7694450 x1624701002716944/t0(0) o4->f88d3e4f-b8ad-7e3f-e052-b857e571de2a@10.9.107.13@o2ib4:93/0 lens 488/448 e 2 to 0 dl 1550012073 ref 1 fl Interpret:/0/0 rc 0/0 [494759.390657] LustreError: 14075:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 1 previous similar message [494759.582647] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 1 seconds [494759.592730] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (7): c: 0, oc: 0, rc: 8 [494759.604851] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff89636ce52000 [494759.845840] LustreError: 12348:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.106.18@o2ib4: deadline 31:2s ago req@ffff896c9f7bb050 x1624748396237840/t0(0) o4->25262cb2-e449-1554-b5d1-0b6e448154f1@10.9.106.18@o2ib4:65/0 lens 488/0 e 0 to 0 dl 1550012045 ref 2 fl Interpret:/2/ffffffff rc 0/-1 [494759.877681] LustreError: 12348:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 23 previous similar messages [494760.011799] Lustre: 12232:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff896caf651450 x1624705154179040/t0(0) o4->153a9818-8d26-403e-7a91-27bec80982b8@10.8.23.35@o2ib6:65/0 lens 488/448 e 1 to 0 dl 1550012045 ref 2 fl Interpret:/0/0 rc 0/0 [494760.042517] Lustre: 12232:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages [494760.119879] LustreError: 14074:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 0+3s req@ffff896c9f7bec50 x1624705154179008/t0(0) o4->153a9818-8d26-403e-7a91-27bec80982b8@10.8.23.35@o2ib6:65/0 lens 488/448 e 1 to 0 dl 1550012045 ref 1 fl Interpret:/0/0 rc 0/0 [494760.582684] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff89636ce53400 [494760.593562] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8994497e2400 [494760.898941] LustreError: 53896:0:(events.c:305:request_in_callback()) event type 2, status -5, service mdt_io [494760.909322] LustreError: 53896:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff898305bade00 [494761.582707] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8966d8765200 [494762.360306] Lustre: 12275:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (32:11s); client may timeout. req@ffff89724f74c450 x1624753475252784/t34426287702(0) o4->16db5e03-e2d9-f103-4a0b-78f283c497a4@10.8.3.2@o2ib6:59/0 lens 504/416 e 1 to 0 dl 1550012039 ref 1 fl Complete:/0/0 rc 0/0 [494762.389216] Lustre: 12275:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 31 previous similar messages [494762.753187] LustreError: 12343:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check [494762.764665] LustreError: 12343:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.8.8.5@o2ib6 x1624736839009616 [494764.117919] LustreError: 53900:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff8996c4387c00 [494764.582883] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff897de73b2400 [494765.492792] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [ldlm_cn01_014:56364] [494765.560790] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [ldlm_cn01_031:56389] [494765.500880] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure [494765.560791] Modules linked in: [494765.560792] osp(OE) [494765.560793] mdd(OE) [494765.560793] mdt(OE) [494765.560794] lustre(OE) [494765.560794] mdc(OE) [494765.560795] lod(OE) [494765.560795] lfsck(OE) [494765.560796] mgs(OE) [494765.560796] mgc(OE) [494765.560797] osd_ldiskfs(OE) [494765.560797] lquota(OE) [494765.560797] ldiskfs(OE) [494765.560798] lmv(OE) [494765.560798] osc(OE) [494765.560798] lov(OE) [494765.560799] fid(OE) [494765.560799] fld(OE) [494765.560800] ko2iblnd(OE) [494765.560800] ptlrpc(OE) [494765.560800] obdclass(OE) [494765.560801] lnet(OE) [494765.560801] libcfs(OE) [494765.560802] rpcsec_gss_krb5 [494765.560802] auth_rpcgss [494765.560802] nfsv4 [494765.560803] dns_resolver [494765.560803] nfs [494765.560804] lockd [494765.560804] grace [494765.560804] fscache [494765.560805] rdma_ucm(OE) [494765.560805] ib_ucm(OE) [494765.560805] rdma_cm(OE) [494765.560806] iw_cm(OE) [494765.560806] ib_ipoib(OE) [494765.560807] ib_cm(OE) [494765.560807] ib_umad(OE) [494765.560807] mlx5_fpga_tools(OE) [494765.560808] mlx4_en(OE) [494765.560808] mlx4_ib(OE) [494765.560809] mlx4_core(OE) [494765.560809] dell_rbu [494765.560809] sunrpc [494765.560810] vfat [494765.560810] fat [494765.560811] dm_round_robin [494765.560811] dcdbas [494765.560811] amd64_edac_mod [494765.560812] edac_mce_amd [494765.560812] kvm_amd [494765.560812] kvm [494765.560813] irqbypass [494765.560813] crc32_pclmul [494765.560813] ghash_clmulni_intel [494765.560814] aesni_intel [494765.560814] lrw [494765.560814] gf128mul [494765.560815] glue_helper [494765.560815] ablk_helper [494765.560816] cryptd [494765.560816] ses [494765.560816] dm_multipath [494765.560817] ipmi_si [494765.560817] enclosure [494765.560818] pcspkr [494765.560818] dm_mod [494765.560819] sg [494765.560819] ipmi_devintf [494765.560820] ccp [494765.560820] i2c_piix4 [494765.560821] ipmi_msghandler [494765.560821] k10temp [494765.560822] acpi_power_meter [494765.560823] knem(OE) [494765.560823] ip_tables [494765.560824] ext4 [494765.560825] mbcache [494765.560825] jbd2 [494765.560826] sd_mod [494765.560826] crc_t10dif [494765.560827] crct10dif_generic [494765.560827] mlx5_ib(OE) [494765.560828] ib_uverbs(OE) [494765.560829] ib_core(OE) [494765.560829] i2c_algo_bit [494765.560830] drm_kms_helper [494765.560830] ahci [494765.560831] syscopyarea [494765.560832] mlx5_core(OE) [494765.560833] sysfillrect [494765.560833] sysimgblt [494765.560834] libahci [494765.560834] mlxfw(OE) [494765.560835] fb_sys_fops [494765.560835] devlink [494765.560835] ttm [494765.560836] crct10dif_pclmul [494765.560836] tg3 [494765.560837] crct10dif_common [494765.560838] mlx_compat(OE) [494765.560838] drm [494765.560839] megaraid_sas [494765.560839] crc32c_intel [494765.560839] libata [494765.560840] ptp [494765.560841] drm_panel_orientation_quirks [494765.560842] pps_core [494765.560842] mpt3sas(OE) [494765.560843] raid_class [494765.560844] scsi_transport_sas [494765.560844] [last unloaded: osp] [494765.560845] [494765.560848] CPU: 13 PID: 56389 Comm: ldlm_cn01_031 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494765.560849] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494765.560850] task: ffff89976bf94100 ti: ffff8997bc754000 task.ti: ffff8997bc754000 [494765.560852] RIP: 0010:[] [494765.560859] [] native_queued_spin_lock_slowpath+0x122/0x200 [494765.560860] RSP: 0018:ffff8997bc757c38 EFLAGS: 00000246 [494765.560861] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000690000 [494765.560862] RDX: ffff8987ff71b780 RSI: 0000000000910001 RDI: ffff8964b4b5cb5c [494765.560863] RBP: ffff8997bc757c38 R08: ffff8977ff6db780 R09: 0000000000000000 [494765.560863] R10: ffff89593fc07600 R11: ffffc451e7eaa200 R12: ffff8997bc757be0 [494765.560864] R13: ffff8980de370fc0 R14: ffff8997bc757ba0 R15: ffffffffc0c6a378 [494765.560866] FS: 00007fc03fe86880(0000) GS:ffff8977ff6c0000(0000) knlGS:0000000000000000 [494765.560867] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494765.560867] CR2: 00007f3fc022f2e0 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494765.560869] Call Trace: [494765.560875] [] queued_spin_lock_slowpath+0xb/0xf [494765.560879] [] _raw_spin_lock+0x20/0x30 [494765.560915] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494765.560948] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494765.560983] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494765.561018] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494765.561052] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494765.561092] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494765.561130] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494765.561134] [] ? default_wake_function+0x12/0x20 [494765.561137] [] ? __wake_up_common+0x5b/0x90 [494765.561174] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494765.561211] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494765.561214] [] kthread+0xd1/0xe0 [494765.561216] [] ? insert_kthread_work+0x40/0x40 [494765.561218] [] ret_from_fork_nospec_begin+0xe/0x21 [494765.561220] [] ? insert_kthread_work+0x40/0x40 [494765.561220] Code: [494765.561221] 13 [494765.561222] 48 [494765.561222] c1 [494765.561223] ea [494765.561223] 0d [494765.561223] 48 [494765.561224] 98 [494765.561224] 83 [494765.561224] e2 [494765.561225] 30 [494765.561225] 48 [494765.561225] 81 [494765.561226] c2 [494765.561226] 80 [494765.561226] b7 [494765.561227] 01 [494765.561227] 00 [494765.561227] 48 [494765.561228] 03 [494765.561228] 14 [494765.561228] c5 [494765.561228] 60 [494765.561229] b9 [494765.561229] 34 [494765.561229] 9e [494765.561230] 4c [494765.561230] 89 [494765.561230] 02 [494765.561231] 41 [494765.561231] 8b [494765.561231] 40 [494765.561232] 08 [494765.561232] 85 [494765.561232] c0 [494765.561233] 75 [494765.561233] 0f [494765.561233] 0f [494765.561234] 1f [494765.561234] 44 [494765.561234] 00 [494765.561235] 00 [494765.561235] f3 [494765.561235] 90 [494765.561236] <41> [494765.561236] 8b [494765.561236] 40 [494765.561237] 08 [494765.561237] 85 [494765.561237] c0 [494765.561238] 74 [494765.561238] f6 [494765.561238] 4d [494765.561239] 8b [494765.561239] 08 [494765.561239] 4d [494765.561240] 85 [494765.561240] c9 [494765.561240] 74 [494765.561241] 04 [494765.561241] 41 [494765.561241] 0f [494765.561242] 18 [494765.561242] 09 [494765.561242] 8b [494765.561243] [494765.582843] LNetError: 53887:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [494765.582846] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff896e3bb0f400 [494765.681792] NMI watchdog: BUG: soft lockup - CPU#27 stuck for 22s! [ldlm_cn03_010:72691] [494765.681793] Modules linked in: [494765.681795] osp(OE) [494765.681795] mdd(OE) [494765.681796] mdt(OE) [494765.681797] lustre(OE) [494765.681797] mdc(OE) [494765.681798] lod(OE) [494765.681799] lfsck(OE) [494765.681799] mgs(OE) [494765.681800] mgc(OE) [494765.681801] osd_ldiskfs(OE) [494765.681802] lquota(OE) [494765.681802] ldiskfs(OE) [494765.681803] lmv(OE) [494765.681804] osc(OE) [494765.681804] lov(OE) [494765.681805] fid(OE) [494765.681805] fld(OE) [494765.681806] ko2iblnd(OE) [494765.681806] ptlrpc(OE) [494765.681807] obdclass(OE) [494765.681808] lnet(OE) [494765.681808] libcfs(OE) [494765.681809] rpcsec_gss_krb5 [494765.681810] auth_rpcgss [494765.681811] nfsv4 [494765.681811] dns_resolver [494765.681812] nfs [494765.681813] lockd [494765.681813] grace [494765.681814] fscache [494765.681815] rdma_ucm(OE) [494765.681816] ib_ucm(OE) [494765.681816] rdma_cm(OE) [494765.681817] iw_cm(OE) [494765.681818] ib_ipoib(OE) [494765.681818] ib_cm(OE) [494765.681819] ib_umad(OE) [494765.681820] mlx5_fpga_tools(OE) [494765.681821] mlx4_en(OE) [494765.681821] mlx4_ib(OE) [494765.681822] mlx4_core(OE) [494765.681823] dell_rbu [494765.681823] sunrpc [494765.681824] vfat [494765.681825] fat [494765.681825] dm_round_robin [494765.681826] dcdbas [494765.681827] amd64_edac_mod [494765.681827] edac_mce_amd [494765.681828] kvm_amd [494765.681828] kvm [494765.681829] irqbypass [494765.681829] crc32_pclmul [494765.681830] ghash_clmulni_intel [494765.681830] aesni_intel [494765.681831] lrw [494765.681831] gf128mul [494765.681832] glue_helper [494765.681832] ablk_helper [494765.681833] cryptd [494765.681834] ses [494765.681834] dm_multipath [494765.681835] ipmi_si [494765.681835] enclosure [494765.681836] pcspkr [494765.681837] dm_mod [494765.681837] sg [494765.681838] ipmi_devintf [494765.681838] ccp [494765.681839] i2c_piix4 [494765.681839] ipmi_msghandler [494765.681840] k10temp [494765.681840] acpi_power_meter [494765.681841] knem(OE) [494765.681841] ip_tables [494765.681842] ext4 [494765.681843] mbcache [494765.681843] jbd2 [494765.681844] sd_mod [494765.681844] crc_t10dif [494765.681845] crct10dif_generic [494765.681845] mlx5_ib(OE) [494765.681846] ib_uverbs(OE) [494765.681847] ib_core(OE) [494765.681847] i2c_algo_bit [494765.681848] drm_kms_helper [494765.681849] ahci [494765.681850] syscopyarea [494765.681850] mlx5_core(OE) [494765.681851] sysfillrect [494765.681851] sysimgblt [494765.681852] libahci [494765.681853] mlxfw(OE) [494765.681853] fb_sys_fops [494765.681854] devlink [494765.681854] ttm [494765.681854] crct10dif_pclmul [494765.681855] tg3 [494765.681856] crct10dif_common [494765.681856] mlx_compat(OE) [494765.681857] drm [494765.681857] megaraid_sas [494765.681858] crc32c_intel [494765.681858] libata [494765.681859] ptp [494765.681859] drm_panel_orientation_quirks [494765.681860] pps_core [494765.681861] mpt3sas(OE) [494765.681861] raid_class [494765.681862] scsi_transport_sas [494765.681862] [last unloaded: osp] [494765.681863] [494765.681865] CPU: 27 PID: 72691 Comm: ldlm_cn03_010 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494765.681866] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494765.681867] task: ffff8977f60d0000 ti: ffff899646f34000 task.ti: ffff899646f34000 [494765.681868] RIP: 0010:[] [494765.681872] [] native_queued_spin_lock_slowpath+0x122/0x200 [494765.681873] RSP: 0018:ffff899646f37c38 EFLAGS: 00000246 [494765.681874] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000d90000 [494765.681875] RDX: ffff8987ff69b780 RSI: 0000000000510000 RDI: ffff8964b4b5cb5c [494765.681876] RBP: ffff899646f37c38 R08: ffff89983f59b780 R09: 0000000000000000 [494765.681876] R10: ffff89593fc07600 R11: ffffc4523b189800 R12: ffff899646f37be0 [494765.681877] R13: ffff8980de370fc0 R14: ffff899646f37ba0 R15: ffffffffc0c6a378 [494765.681878] FS: 00007f76742ee700(0000) GS:ffff89983f580000(0000) knlGS:0000000000000000 [494765.681879] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494765.681880] CR2: 00007f92c1cc6248 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494765.681881] Call Trace: [494765.681884] [] queued_spin_lock_slowpath+0xb/0xf [494765.681887] [] _raw_spin_lock+0x20/0x30 [494765.681928] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494765.681959] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494765.681994] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494765.682028] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494765.682061] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494765.682100] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494765.682136] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494765.682172] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494765.682208] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494765.682210] [] kthread+0xd1/0xe0 [494765.682212] [] ? insert_kthread_work+0x40/0x40 [494765.682214] [] ret_from_fork_nospec_begin+0xe/0x21 [494765.682215] [] ? insert_kthread_work+0x40/0x40 [494765.682216] Code: [494765.682217] 13 [494765.682217] 48 [494765.682217] c1 [494765.682218] ea [494765.682218] 0d [494765.682218] 48 [494765.682219] 98 [494765.682219] 83 [494765.682219] e2 [494765.682220] 30 [494765.682220] 48 [494765.682220] 81 [494765.682221] c2 [494765.682221] 80 [494765.682221] b7 [494765.682222] 01 [494765.682222] 00 [494765.682222] 48 [494765.682223] 03 [494765.682223] 14 [494765.682223] c5 [494765.682224] 60 [494765.682224] b9 [494765.682224] 34 [494765.682225] 9e [494765.682225] 4c [494765.682225] 89 [494765.682225] 02 [494765.682226] 41 [494765.682226] 8b [494765.682226] 40 [494765.682227] 08 [494765.682227] 85 [494765.682227] c0 [494765.682228] 75 [494765.682228] 0f [494765.682228] 0f [494765.682229] 1f [494765.682229] 44 [494765.682229] 00 [494765.682230] 00 [494765.682230] f3 [494765.682230] 90 [494765.682231] <41> [494765.682231] 8b [494765.682231] 40 [494765.682232] 08 [494765.682232] 85 [494765.682232] c0 [494765.682233] 74 [494765.682233] f6 [494765.682233] 4d [494765.682234] 8b [494765.682234] 08 [494765.682234] 4d [494765.682234] 85 [494765.682235] c9 [494765.682235] 74 [494765.682235] 04 [494765.682236] 41 [494765.682236] 0f [494765.682236] 18 [494765.682237] 09 [494765.682237] 8b [494765.682237] [494765.776797] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [ldlm_cn02_023:56405] [494765.776799] Modules linked in: [494765.776800] osp(OE) [494765.776801] mdd(OE) [494765.776802] mdt(OE) [494765.776803] lustre(OE) [494765.776803] mdc(OE) [494765.776804] lod(OE) [494765.776805] lfsck(OE) [494765.776805] mgs(OE) [494765.776806] mgc(OE) [494765.776807] osd_ldiskfs(OE) [494765.776807] lquota(OE) [494765.776808] ldiskfs(OE) [494765.776809] lmv(OE) [494765.776809] osc(OE) [494765.776810] lov(OE) [494765.776810] fid(OE) [494765.776811] fld(OE) [494765.776811] ko2iblnd(OE) [494765.776812] ptlrpc(OE) [494765.776812] obdclass(OE) [494765.776813] lnet(OE) [494765.776813] libcfs(OE) [494765.776814] rpcsec_gss_krb5 [494765.776814] auth_rpcgss [494765.776815] nfsv4 [494765.776815] dns_resolver [494765.776816] nfs [494765.776817] lockd [494765.776817] grace [494765.776818] fscache [494765.776819] rdma_ucm(OE) [494765.776819] ib_ucm(OE) [494765.776820] rdma_cm(OE) [494765.776820] iw_cm(OE) [494765.776821] ib_ipoib(OE) [494765.776822] ib_cm(OE) [494765.776822] ib_umad(OE) [494765.776823] mlx5_fpga_tools(OE) [494765.776824] mlx4_en(OE) [494765.776824] mlx4_ib(OE) [494765.776825] mlx4_core(OE) [494765.776825] dell_rbu [494765.776826] sunrpc [494765.776826] vfat [494765.776827] fat [494765.776828] dm_round_robin [494765.776828] dcdbas [494765.776829] amd64_edac_mod [494765.776829] edac_mce_amd [494765.776830] kvm_amd [494765.776830] kvm [494765.776831] irqbypass [494765.776831] crc32_pclmul [494765.776832] ghash_clmulni_intel [494765.776832] aesni_intel [494765.776833] lrw [494765.776833] gf128mul [494765.776834] glue_helper [494765.776834] ablk_helper [494765.776835] cryptd [494765.776835] ses [494765.776836] dm_multipath [494765.776836] ipmi_si [494765.776837] enclosure [494765.776837] pcspkr [494765.776838] dm_mod [494765.776838] sg [494765.776839] ipmi_devintf [494765.776839] ccp [494765.776840] i2c_piix4 [494765.776840] ipmi_msghandler [494765.776841] k10temp [494765.776841] acpi_power_meter [494765.776842] knem(OE) [494765.776843] ip_tables [494765.776843] ext4 [494765.776844] mbcache [494765.776844] jbd2 [494765.776845] sd_mod [494765.776845] crc_t10dif [494765.776846] crct10dif_generic [494765.776846] mlx5_ib(OE) [494765.776847] ib_uverbs(OE) [494765.776847] ib_core(OE) [494765.776848] i2c_algo_bit [494765.776848] drm_kms_helper [494765.776849] ahci [494765.776849] syscopyarea [494765.776850] mlx5_core(OE) [494765.776851] sysfillrect [494765.776851] sysimgblt [494765.776852] libahci [494765.776852] mlxfw(OE) [494765.776853] fb_sys_fops [494765.776853] devlink [494765.776854] ttm [494765.776854] crct10dif_pclmul [494765.776855] tg3 [494765.776855] crct10dif_common [494765.776856] mlx_compat(OE) [494765.776857] drm [494765.776857] megaraid_sas [494765.776858] crc32c_intel [494765.776858] libata [494765.776859] ptp [494765.776859] drm_panel_orientation_quirks [494765.776860] pps_core [494765.776861] mpt3sas(OE) [494765.776861] raid_class [494765.776862] scsi_transport_sas [494765.776863] [last unloaded: osp] [494765.776863] [494765.776866] CPU: 38 PID: 56405 Comm: ldlm_cn02_023 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494765.776867] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494765.776868] task: ffff8997cb27c100 ti: ffff899692320000 task.ti: ffff899692320000 [494765.776870] RIP: 0010:[] [494765.776877] [] native_queued_spin_lock_slowpath+0x122/0x200 [494765.776878] RSP: 0018:ffff899692323c38 EFLAGS: 00000246 [494765.776878] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000001310000 [494765.776879] RDX: ffff8987ff79b780 RSI: 0000000000d10000 RDI: ffff8964b4b5cb5c [494765.776880] RBP: ffff899692323c38 R08: ffff8987ff85b780 R09: 0000000000000000 [494765.776880] R10: ffff89593fc07600 R11: ffff89840a3950c0 R12: ffff899692323be0 [494765.776881] R13: ffff8980de370fc0 R14: ffff899692323ba0 R15: ffffffffc0c6a378 [494765.776882] FS: 00007fb5656a6700(0000) GS:ffff8987ff840000(0000) knlGS:0000000000000000 [494765.776883] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494765.776884] CR2: 00007fb56a20c000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494765.776885] Call Trace: [494765.776891] [] queued_spin_lock_slowpath+0xb/0xf [494765.776895] [] _raw_spin_lock+0x20/0x30 [494765.776934] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494765.776960] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494765.776990] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494765.777019] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494765.777047] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494765.777083] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494765.777114] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494765.777120] [] ? default_wake_function+0x12/0x20 [494765.777123] [] ? __wake_up_common+0x5b/0x90 [494765.777154] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494765.777185] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494765.777189] [] kthread+0xd1/0xe0 [494765.777191] [] ? insert_kthread_work+0x40/0x40 [494765.777194] [] ret_from_fork_nospec_begin+0xe/0x21 [494765.777195] [] ? insert_kthread_work+0x40/0x40 [494765.777196] Code: [494765.777197] 13 [494765.777198] 48 [494765.777198] c1 [494765.777198] ea [494765.777199] 0d [494765.777199] 48 [494765.777199] 98 [494765.777200] 83 [494765.777200] e2 [494765.777200] 30 [494765.777200] 48 [494765.777201] 81 [494765.777201] c2 [494765.777201] 80 [494765.777202] b7 [494765.777202] 01 [494765.777202] 00 [494765.777202] 48 [494765.777203] 03 [494765.777203] 14 [494765.777203] c5 [494765.777204] 60 [494765.777204] b9 [494765.777204] 34 [494765.777204] 9e [494765.777205] 4c [494765.777205] 89 [494765.777205] 02 [494765.777206] 41 [494765.777206] 8b [494765.777206] 40 [494765.777206] 08 [494765.777207] 85 [494765.777207] c0 [494765.777207] 75 [494765.777208] 0f [494765.777208] 0f [494765.777208] 1f [494765.777208] 44 [494765.777209] 00 [494765.777209] 00 [494765.777209] f3 [494765.777210] 90 [494765.777210] <41> [494765.777210] 8b [494765.777211] 40 [494765.777211] 08 [494765.777211] 85 [494765.777211] c0 [494765.777212] 74 [494765.777212] f6 [494765.777212] 4d [494765.777213] 8b [494765.777213] 08 [494765.777213] 4d [494765.777213] 85 [494765.777214] c9 [494765.777214] 74 [494765.777214] 04 [494765.777215] 41 [494765.777215] 0f [494765.777215] 18 [494765.777216] 09 [494765.777216] 8b [494765.777216] [494766.618933] pcspkr [494766.621152] dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [494766.666491] CPU: 5 PID: 56364 Comm: ldlm_cn01_014 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494766.679081] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494766.686735] task: ffff89978ff60000 ti: ffff89968eb6c000 task.ti: ffff89968eb6c000 [494766.694299] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 [494766.704320] RSP: 0018:ffff89968eb6fbb8 EFLAGS: 00000246 [494766.709720] RAX: 0000000000000000 RBX: ffff89968eb6fbf0 RCX: 0000000000290000 [494766.716938] RDX: ffff8987ff79b780 RSI: 0000000000d10000 RDI: ffff8964b4b5cb5c [494766.724156] RBP: ffff89968eb6fbb8 R08: ffff8977ff65b780 R09: 0000000000000000 [494766.731378] R10: ffff89593fc07600 R11: ffff896cf3adc908 R12: 0000000000000000 [494766.738595] R13: 0000000000000000 R14: 0000000000000002 R15: ffff89778dac7980 [494766.745818] FS: 00007faf1af90780(0000) GS:ffff8977ff640000(0000) knlGS:0000000000000000 [494766.753990] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494766.759821] CR2: 00007faf1afa0000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494766.767042] Call Trace: [494766.769584] [] queued_spin_lock_slowpath+0xb/0xf [494766.775934] [] _raw_spin_lock+0x20/0x30 [494766.781537] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494766.788177] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [494766.795141] [] ? native_queued_spin_lock_slowpath+0x126/0x200 [494766.802648] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [494766.809290] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494766.816286] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494766.823106] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494766.830104] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494766.837878] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494766.844749] [] ? default_wake_function+0x12/0x20 [494766.851101] [] ? __wake_up_common+0x5b/0x90 [494766.857057] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494766.863443] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494766.870924] [] kthread+0xd1/0xe0 [494766.875888] [] ? insert_kthread_work+0x40/0x40 [494766.882069] [] ret_from_fork_nospec_begin+0xe/0x21 [494766.888596] [] ? insert_kthread_work+0x40/0x40 [494766.894773] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [494767.582878] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff896a773dd400 [494767.593752] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff896a773dc200 [494768.244922] Lustre: 12250:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (32:11s); client may timeout. req@ffff896caf651450 x1624705154179040/t34426287730(0) o4->153a9818-8d26-403e-7a91-27bec80982b8@10.8.23.35@o2ib6:65/0 lens 488/416 e 1 to 0 dl 1550012045 ref 1 fl Complete:/0/0 rc 0/0 [494768.751914] LustreError: 13760:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff8995ee6e1850 x1624700720499056/t0(0) o4->c50e9e63-bc69-ffb4-d9c5-0a1d77a8b849@10.9.106.60@o2ib4:80/0 lens 488/448 e 0 to 0 dl 1550012060 ref 2 fl Interpret:/0/0 rc 0/0 [494768.776410] Lustre: fir-MDT0002: Bulk IO write error with c50e9e63-bc69-ffb4-d9c5-0a1d77a8b849 (at 10.9.106.60@o2ib4), client will retry: rc = -110 [494768.789700] Lustre: Skipped 4 previous similar messages [494769.526891] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 21s! [ldlm_cn01_028:56386] [494769.582992] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff899566267e00 [494769.534975] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [494769.618835] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [494769.652260] CPU: 9 PID: 56386 Comm: ldlm_cn01_028 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494769.664848] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494769.672502] task: ffff89976bf91040 ti: ffff89976bf98000 task.ti: ffff89976bf98000 [494769.681766] LustreError: 53892:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff89648cec7000 [494769.681780] LustreError: 14070:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff8966676dbc50 x1624747850090736/t0(0) o4->56a7901d-9f1f-93d3-c75e-92d9b6eaca50@10.9.107.48@o2ib4:90/0 lens 488/448 e 2 to 0 dl 1550012070 ref 1 fl Interpret:/0/0 rc 0/0 [494769.683004] LNetError: 53889:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) [494769.706892] NMI watchdog: BUG: soft lockup - CPU#30 stuck for 23s! [ldlm_cn02_027:56412] [494769.680070] RIP: 0010:[] [494769.706894] Modules linked in: [494769.706895] osp(OE) [494769.706896] mdd(OE) [494769.706896] mdt(OE) [494769.706897] lustre(OE) [494769.706897] mdc(OE) [494769.706898] lod(OE) [494769.706898] lfsck(OE) [494769.706899] mgs(OE) [494769.706899] mgc(OE) [494769.706899] osd_ldiskfs(OE) [494769.706900] lquota(OE) [494769.706900] ldiskfs(OE) [494769.706901] lmv(OE) [494769.706901] osc(OE) [494769.706901] lov(OE) [494769.706902] fid(OE) [494769.706902] fld(OE) [494769.706903] ko2iblnd(OE) [494769.706903] ptlrpc(OE) [494769.706904] obdclass(OE) [494769.706904] lnet(OE) [494769.706904] libcfs(OE) [494769.706905] rpcsec_gss_krb5 [494769.706905] auth_rpcgss [494769.706906] nfsv4 [494769.706906] dns_resolver [494769.706907] nfs [494769.706907] lockd [494769.706908] grace [494769.706908] fscache [494769.706908] rdma_ucm(OE) [494769.706909] ib_ucm(OE) [494769.706909] rdma_cm(OE) [494769.706910] iw_cm(OE) [494769.706910] ib_ipoib(OE) [494769.706911] ib_cm(OE) [494769.706911] ib_umad(OE) [494769.706912] mlx5_fpga_tools(OE) [494769.706912] mlx4_en(OE) [494769.706912] mlx4_ib(OE) [494769.706913] mlx4_core(OE) [494769.706913] dell_rbu [494769.706913] sunrpc [494769.706914] vfat [494769.706914] fat [494769.706915] dm_round_robin [494769.706915] dcdbas [494769.706915] amd64_edac_mod [494769.706916] edac_mce_amd [494769.706916] kvm_amd [494769.706917] kvm [494769.706917] irqbypass [494769.706918] crc32_pclmul [494769.706918] ghash_clmulni_intel [494769.706918] aesni_intel [494769.706919] lrw [494769.706919] gf128mul [494769.706920] glue_helper [494769.706920] ablk_helper [494769.706920] cryptd [494769.706921] ses [494769.706921] dm_multipath [494769.706922] ipmi_si [494769.706922] enclosure [494769.706922] pcspkr [494769.706923] dm_mod [494769.706923] sg [494769.706924] ipmi_devintf [494769.706924] ccp [494769.706924] i2c_piix4 [494769.706925] ipmi_msghandler [494769.706925] k10temp [494769.706925] acpi_power_meter [494769.706926] knem(OE) [494769.706926] ip_tables [494769.706927] ext4 [494769.706927] mbcache [494769.706927] jbd2 [494769.706928] sd_mod [494769.706928] crc_t10dif [494769.706928] crct10dif_generic [494769.706929] mlx5_ib(OE) [494769.706929] ib_uverbs(OE) [494769.706930] ib_core(OE) [494769.706930] i2c_algo_bit [494769.706930] drm_kms_helper [494769.706931] ahci [494769.706931] syscopyarea [494769.706932] mlx5_core(OE) [494769.706932] sysfillrect [494769.706933] sysimgblt [494769.706933] libahci [494769.706933] mlxfw(OE) [494769.706934] fb_sys_fops [494769.706934] devlink [494769.706934] ttm [494769.706935] crct10dif_pclmul [494769.706935] tg3 [494769.706936] crct10dif_common [494769.706936] mlx_compat(OE) [494769.706936] drm [494769.706937] megaraid_sas [494769.706937] crc32c_intel [494769.706937] libata [494769.706938] ptp [494769.706939] drm_panel_orientation_quirks [494769.706939] pps_core [494769.706939] mpt3sas(OE) [494769.706940] raid_class [494769.706940] scsi_transport_sas [494769.706940] [last unloaded: osp] [494769.706941] [494769.706943] CPU: 30 PID: 56412 Comm: ldlm_cn02_027 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494769.706944] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494769.706945] task: ffff8996ddabc100 ti: ffff8997b232c000 task.ti: ffff8997b232c000 [494769.706946] RIP: 0010:[] [494769.706950] [] native_queued_spin_lock_slowpath+0x122/0x200 [494769.706951] RSP: 0018:ffff8997b232fc38 EFLAGS: 00000246 [494769.706952] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000f10000 [494769.706953] RDX: ffff8977ff75b780 RSI: 0000000000a90001 RDI: ffff8964b4b5cb5c [494769.706954] RBP: ffff8997b232fc38 R08: ffff8987ff7db780 R09: 0000000000000000 [494769.706954] R10: ffff89593fc07600 R11: ffff89840a3950c0 R12: ffff8997b232fbe0 [494769.706955] R13: ffff8980de370fc0 R14: ffff8997b232fba0 R15: ffffffffc0c6a378 [494769.706956] FS: 00007fb5676aa700(0000) GS:ffff8987ff7c0000(0000) knlGS:0000000000000000 [494769.706957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494769.706958] CR2: 00007fd79afb4018 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494769.706959] Call Trace: [494769.706962] [] queued_spin_lock_slowpath+0xb/0xf [494769.706964] [] _raw_spin_lock+0x20/0x30 [494769.707000] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494769.707031] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494769.707065] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494769.707100] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494769.707133] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494769.707173] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494769.707210] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494769.707212] [] ? default_wake_function+0x12/0x20 [494769.707214] [] ? __wake_up_common+0x5b/0x90 [494769.707251] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494769.707287] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494769.707289] [] kthread+0xd1/0xe0 [494769.707291] [] ? insert_kthread_work+0x40/0x40 [494769.707293] [] ret_from_fork_nospec_begin+0xe/0x21 [494769.707294] [] ? insert_kthread_work+0x40/0x40 [494769.707295] Code: [494769.707295] 13 [494769.707296] 48 [494769.707296] c1 [494769.707297] ea [494769.707297] 0d [494769.707297] 48 [494769.707298] 98 [494769.707298] 83 [494769.707298] e2 [494769.707298] 30 [494769.707299] 48 [494769.707299] 81 [494769.707299] c2 [494769.707300] 80 [494769.707300] b7 [494769.707300] 01 [494769.707301] 00 [494769.707301] 48 [494769.707301] 03 [494769.707302] 14 [494769.707302] c5 [494769.707302] 60 [494769.707303] b9 [494769.707303] 34 [494769.707303] 9e [494769.707304] 4c [494769.707304] 89 [494769.707304] 02 [494769.707305] 41 [494769.707305] 8b [494769.707305] 40 [494769.707306] 08 [494769.707306] 85 [494769.707306] c0 [494769.707306] 75 [494769.707307] 0f [494769.707307] 0f [494769.707307] 1f [494769.707308] 44 [494769.707308] 00 [494769.707308] 00 [494769.707309] f3 [494769.707309] 90 [494769.707310] <41> [494769.707310] 8b [494769.707310] 40 [494769.707311] 08 [494769.707311] 85 [494769.707311] c0 [494769.707312] 74 [494769.707312] f6 [494769.707312] 4d [494769.707312] 8b [494769.707313] 08 [494769.707313] 4d [494769.707313] 85 [494769.707314] c9 [494769.707314] 74 [494769.707314] 04 [494769.707315] 41 [494769.707315] 0f [494769.707315] 18 [494769.707316] 09 [494769.707316] 8b [494769.707316] [494770.080904] [] native_queued_spin_lock_slowpath+0x126/0x200 [494770.088409] RSP: 0018:ffff89976bf9bc38 EFLAGS: 00000246 [494770.093807] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000490000 [494770.101028] RDX: ffff89983f61b780 RSI: 0000000001190001 RDI: ffff8964b4b5cb5c [494770.108246] RBP: ffff89976bf9bc38 R08: ffff8977ff69b780 R09: 0000000000000000 [494770.115466] R10: ffff89593fc07600 R11: 0000000000000000 R12: ffff89976bf9bbe0 [494770.122685] R13: ffff8980de370fc0 R14: ffff89976bf9bba0 R15: ffffffffc0c6a378 [494770.129907] FS: 00007f76742cd700(0000) GS:ffff8977ff680000(0000) knlGS:0000000000000000 [494770.138079] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494770.143911] CR2: 00007f76743a2000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494770.151132] Call Trace: [494770.153676] [] queued_spin_lock_slowpath+0xb/0xf [494770.160036] [] _raw_spin_lock+0x20/0x30 [494770.165645] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [494770.172293] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [494770.178944] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [494770.185934] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [494770.192755] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [494770.199757] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494770.207529] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [494770.214402] [] ? wake_up_state+0x20/0x20 [494770.220094] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494770.226481] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [494770.233964] [] kthread+0xd1/0xe0 [494770.238928] [] ? insert_kthread_work+0x40/0x40 [494770.245108] [] ret_from_fork_nospec_begin+0xe/0x21 [494770.251632] [] ? insert_kthread_work+0x40/0x40 [494770.257809] Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 [494770.711332] LustreError: 53894:0:(events.c:305:request_in_callback()) event type 2, status -103, service mdt_io [494770.721603] LustreError: 12275:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check [494770.733081] LustreError: 12275:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.101.39@o2ib4 x1624674062164912 [494771.750772] LNetError: 53887:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [494771.763380] LNetError: 53887:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 9 previous similar messages [494772.679010] Lustre: fir-MDT0000: Bulk IO read error with c3ee8e29-24b2-60ad-b950-c5ea318742ba (at 10.8.17.29@o2ib6), client will retry: rc -110 [494773.205573] LustreError: 12275:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.3.29@o2ib6: deadline 6:29s ago req@ffff896cb1f34c50 x1624753478284192/t0(0) o4->c694e053-04d0-ee79-c9a4-0ace9e2f2c9a@10.8.3.29@o2ib6:52/0 lens 504/0 e 0 to 0 dl 1550012032 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [494773.237071] LustreError: 12275:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 19 previous similar messages [494777.657112] LustreError: 14089:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 21+20s req@ffff896c9f7bf450 x1624712545992112/t0(0) o4->4dcaa581-be64-fe6b-fa97-73c2b004579c@10.8.13.13@o2ib6:65/0 lens 488/448 e 1 to 0 dl 1550012045 ref 1 fl Interpret:/0/0 rc 0/0 [494777.682164] Lustre: 14089:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (32:20s); client may timeout. req@ffff896c9f7bf450 x1624712545992112/t0(0) o4->4dcaa581-be64-fe6b-fa97-73c2b004579c@10.8.13.13@o2ib6:65/0 lens 488/448 e 1 to 0 dl 1550012045 ref 1 fl Complete:/0/ffffffff rc -110/-1 [494777.711340] Lustre: 14089:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 26 previous similar messages [494778.361537] LustreError: 12703:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff89968af1a850 x1624748528933040/t0(0) o4->2c2d2f6b-3086-1f70-68ed-98263873eaff@10.9.107.25@o2ib4:93/0 lens 488/448 e 2 to 0 dl 1550012073 ref 1 fl Interpret:/0/0 rc 0/0 [494784.496283] LustreError: 12242:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 25+28s req@ffff896caf651850 x1624700720498896/t0(0) o4->c50e9e63-bc69-ffb4-d9c5-0a1d77a8b849@10.9.106.60@o2ib4:64/0 lens 488/448 e 1 to 0 dl 1550012044 ref 1 fl Interpret:/0/0 rc 0/0 [494784.521397] LustreError: 12242:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message [494787.673443] Lustre: 12428:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550012038/real 1550012038] req@ffff89974273ad00 x1624928883031232/t0(0) o104->fir-MDT0002@10.9.107.43@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1550012075 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [494787.700870] Lustre: 12428:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 17 previous similar messages [494791.517457] LustreError: 12353:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff897491b5ac50 x1624705455122976/t0(0) o3->f0b653e6-2546-c659-97c8-5d3c41619c38@10.8.15.9@o2ib6:105/0 lens 488/440 e 1 to 0 dl 1550012085 ref 1 fl Interpret:/0/0 rc 0/0 [494791.541419] LustreError: 12353:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 36 previous similar messages [494791.551007] Lustre: fir-MDT0002: Bulk IO read error with f0b653e6-2546-c659-97c8-5d3c41619c38 (at 10.8.15.9@o2ib6), client will retry: rc -110 [494792.969288] Lustre: fir-MDT0002: Connection restored to 5fef32e8-af17-7dc4-aa00-fb840b7124e5 (at 10.8.21.28@o2ib6) [494792.979728] Lustre: Skipped 498 previous similar messages [494796.158592] Lustre: fir-MDT0000: Bulk IO read error with c243879d-6590-e58d-10d6-105c5b7b4def (at 10.8.28.1@o2ib6), client will retry: rc -110 [494800.532015] Lustre: fir-MDT0000: Client f87e49de-2ac6-e466-4f82-af6dcb4e090b (at 10.9.112.13@o2ib4) reconnecting [494800.542276] Lustre: Skipped 1276 previous similar messages [494800.797711] Lustre: fir-MDT0002: Bulk IO write error with fbb7e0da-3603-8dfb-de71-fd8cea5618ef (at 10.9.106.69@o2ib4), client will retry: rc = -110 [494800.811033] Lustre: Skipped 51 previous similar messages [494801.137310] Lustre: MGS: Received new LWP connection from 10.8.1.36@o2ib6, removing former export from same NID [494801.147481] Lustre: Skipped 594 previous similar messages [494803.365868] LustreError: 12239:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff895c92bbb050 x1624737493187184/t0(0) o4->7d78b5a7-dae3-eca3-5a98-d1b9fe987149@10.8.17.22@o2ib6:140/0 lens 488/448 e 4 to 0 dl 1550012120 ref 1 fl Interpret:/0/0 rc 0/0 [494803.365880] LustreError: 11487:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff895c92bbc050 x1624761695856368/t0(0) o4->72da9a6e-2827-1c9d-1aa6-7b398153fee1@10.9.106.71@o2ib4:140/0 lens 488/448 e 4 to 0 dl 1550012120 ref 1 fl Interpret:/0/0 rc 0/0 [494803.415363] LustreError: 12239:0:(ldlm_lib.c:3264:target_bulk_io()) Skipped 2 previous similar messages [494817.256153] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.9.107.43@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8961ad37bf00/0x1a4b7ac7cd13f525 lrc: 3/0,0 mode: PR/PR res: [0x2c0001754:0x2c72:0x0].0x0 bits 0x40/0x0 rrc: 279072 type: IBT flags: 0x60000400010020 nid: 10.9.107.43@o2ib4 remote: 0xfb11df74b0a01225 expref: 281172 pid: 12312 timeout: 494804 lvb_type: 0 [494817.294968] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 56031 previous similar messages [494821.759239] Lustre: fir-MDT0000: Bulk IO read error with fe789eb3-1cd9-3594-b889-6606ba1b8e4a (at 10.9.113.2@o2ib4), client will retry: rc -110 [494821.762710] LustreError: 12331:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff898027225a00 x1624928888669472/t0(0) o104->fir-MDT0002@10.9.107.43@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [494828.366397] LustreError: 54057:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk READ req@ffff896fc0e4c450 x1624698312108288/t0(0) o256->3ec34a0b-5eeb-5ed7-db8f-a8c98a95c73a@10.9.103.3@o2ib4:156/0 lens 304/240 e 2 to 0 dl 1550012136 ref 1 fl Interpret:/0/0 rc 0/0 [494828.366791] LustreError: 14151:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff8973867c9450 x1624705310404736/t0(0) o4->9114d135-37ef-ec55-8f47-397cb880c459@10.8.28.6@o2ib6:147/0 lens 488/448 e 4 to 0 dl 1550012127 ref 1 fl Interpret:/0/0 rc 0/0 [494828.366793] LustreError: 14151:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) Skipped 12 previous similar messages [494828.366893] LustreError: 14093:0:(ldlm_lib.c:3273:target_bulk_io()) @@@ truncated bulk READ 0(22139) req@ffff8980f5a4e850 x1624698008813744/t0(0) o3->02a649b1-8bdf-b8d1-3375-fceaf78b0f08@10.9.103.1@o2ib4:144/0 lens 488/440 e 4 to 0 dl 1550012124 ref 1 fl Interpret:/0/0 rc 0/0 [494849.630903] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [ldlm_bl_45:11021] [494849.638820] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [494849.711819] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [494849.745243] CPU: 21 PID: 11021 Comm: ldlm_bl_45 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [494849.757660] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [494849.765313] task: ffff8996d7b08000 ti: ffff8997f73c0000 task.ti: ffff8997f73c0000 [494849.772878] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x19f/0x440 [ptlrpc] [494849.783278] RSP: 0018:ffff8997f73c3bb0 EFLAGS: 00000246 [494849.788679] RAX: 0000000000000001 RBX: ffff89940766c860 RCX: ffff89940766c860 [494849.795898] RDX: ffff8997f73c3ca8 RSI: ffff8995782bee40 RDI: ffff89951daa3a80 [494849.803115] RBP: ffff8997f73c3c08 R08: ffff8997f73c3ca8 R09: 00000000c00043c6 [494849.810337] R10: 0000000000000046 R11: ffff8995782bee40 R12: ffff8997f73c3ca8 [494849.817554] R13: 00000000c00043c6 R14: 0000000000000046 R15: ffff8995782bee40 [494849.824777] FS: 00007fefd5037840(0000) GS:ffff8977ff740000(0000) knlGS:0000000000000000 [494849.832948] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [494849.838780] CR2: 00007ffd11ee88a8 CR3: 0000003dc7410000 CR4: 00000000003407e0 [494849.846001] Call Trace: [494849.848578] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [494849.856173] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [494849.864033] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [494849.871112] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [494849.878184] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [494849.886556] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [494849.894673] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [494849.901638] [] ? wake_up_state+0x20/0x20 [494849.907323] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [494849.914802] [] kthread+0xd1/0xe0 [494849.919766] [] ? insert_kthread_work+0x40/0x40 [494849.925947] [] ret_from_fork_nospec_begin+0xe/0x21 [494849.932471] [] ? insert_kthread_work+0x40/0x40 [494849.938649] Code: 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 74 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 00 00 <49> 81 ed 10 02 00 00 4d 39 fd 75 d5 31 f6 48 8b 45 c0 48 39 45 [494868.471221] Lustre: MGS: Connection restored to f6a7596e-0d5c-7952-7c9a-11b9c0266364 (at 10.8.2.4@o2ib6) [494868.480786] Lustre: Skipped 2417 previous similar messages [494918.057629] LNet: Service thread pid 12428 was inactive for 200.37s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [494918.074649] LNet: Skipped 6 previous similar messages [494918.079797] Pid: 12428, comm: mdt03_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [494918.089624] Call Trace: [494918.092183] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [494918.099232] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [494918.106511] [] mdt_dom_discard_data+0x101/0x130 [mdt] [494918.113375] [] mdt_reint_unlink+0x331/0x14a0 [mdt] [494918.119947] [] mdt_reint_rec+0x83/0x210 [mdt] [494918.126081] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [494918.132737] [] mdt_reint+0x67/0x140 [mdt] [494918.138528] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [494918.145566] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [494918.153373] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [494918.159795] [] kthread+0xd1/0xe0 [494918.164788] [] ret_from_fork_nospec_begin+0xe/0x21 [494918.171348] [] 0xffffffffffffffff [494918.176466] LustreError: dumping log to /tmp/lustre-log.1550012206.12428 [494969.245607] Lustre: MGS: Received new LWP connection from 10.9.113.10@o2ib4, removing former export from same NID [494969.255965] Lustre: Skipped 424 previous similar messages [494972.120112] Lustre: fir-MDT0000: Client 5aa81f57-e870-e6bb-de05-2c8d24b54371 (at 10.8.2.4@o2ib6) reconnecting [494972.130111] Lustre: Skipped 1011 previous similar messages [495032.090988] Lustre: fir-MDT0000: Connection restored to 7278084b-1c41-6d8b-189c-2e5e7608a6e2 (at 10.9.102.37@o2ib4) [495032.101512] Lustre: Skipped 12 previous similar messages [495117.417637] LustreError: 12428:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550012105, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8995782bee40/0x1a4b7ac7cdfc09f2 lrc: 3/0,1 mode: --/PW res: [0x2c0001754:0x2c72:0x0].0x0 bits 0x40/0x0 rrc: 273329 type: IBT flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 12428 timeout: 0 lvb_type: 0 [495480.999049] Lustre: fir-MDT0000: Client b5ad8834-3caa-3b3a-aeae-c877aabb1ef0 (at 10.8.2.6@o2ib6) reconnecting [495481.009049] Lustre: Skipped 4 previous similar messages [495481.014410] Lustre: fir-MDT0000: Connection restored to a61413e9-0563-2e7a-41f8-ffa5690ecb59 (at 10.8.2.6@o2ib6) [495481.024715] Lustre: Skipped 5 previous similar messages [495545.959306] Lustre: 56367:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:1s); client may timeout. req@ffff8985ee6ce300 x1624700769938992/t0(0) o103->1cc72755-0b18-692a-013c-e5abb0ad9b59@10.9.106.44@o2ib4:98/0 lens 328/192 e 0 to 0 dl 1550012833 ref 2 fl Complete:H/0/0 rc 0/0 [495545.987732] Lustre: 56367:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages [495545.998179] LustreError: 56367:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.106.44@o2ib4: deadline 6:1s ago req@ffff8983b0a21050 x1624700769939552/t0(0) o103->1cc72755-0b18-692a-013c-e5abb0ad9b59@10.9.106.44@o2ib4:98/0 lens 328/0 e 0 to 0 dl 1550012833 ref 2 fl Interpret:H/0/ffffffff rc 0/-1 [495546.437193] Lustre: 12439:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550012827/real 1550012827] req@ffff896166bf7500 x1624928968809696/t0(0) o104->fir-MDT0002@10.9.106.64@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1550012834 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 [495546.464707] Lustre: 12439:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [495546.474576] Lustre: fir-MDT0002-osp-MDT0000: Connection to fir-MDT0002 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [495546.489602] Lustre: Skipped 30 previous similar messages [495546.629172] Lustre: MGS: Received new LWP connection from 10.8.21.30@o2ib6, removing former export from same NID [495546.639458] Lustre: Skipped 2 previous similar messages [495548.141247] LustreError: 11490:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff896cb5bc5050 x1624701091917728/t0(0) o4->838e95ba-7d72-299c-b143-b6a5f04f9e78@10.9.107.36@o2ib4:126/0 lens 488/448 e 1 to 0 dl 1550012861 ref 1 fl Interpret:/0/0 rc 0/0 [495548.165459] LustreError: 11490:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 61 previous similar messages [495548.175053] Lustre: fir-MDT0002: Bulk IO write error with 838e95ba-7d72-299c-b143-b6a5f04f9e78 (at 10.9.107.36@o2ib4), client will retry: rc = -110 [495548.188345] Lustre: Skipped 69 previous similar messages [495548.200234] Lustre: 54091:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff896dae786f00 x1624700769937296/t0(0) o103->1cc72755-0b18-692a-013c-e5abb0ad9b59@10.9.106.44@o2ib4:98/0 lens 328/192 e 0 to 0 dl 1550012833 ref 2 fl Complete:H/0/0 rc 0/0 [495548.228631] Lustre: 54091:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 88 previous similar messages [495551.367373] Lustre: 13354:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 12s req@ffff895e8ae48450 x1624928968809648/t0(0) o1000->fir-MDT0000-mdtlov_UUID@0@lo:0/0 lens 304/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [495551.389330] Lustre: 13354:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 34 previous similar messages [495552.604215] Lustre: 56360:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:7s); client may timeout. req@ffff896730b1c200 x1624700769938128/t0(0) o103->1cc72755-0b18-692a-013c-e5abb0ad9b59@10.9.106.44@o2ib4:98/0 lens 328/192 e 0 to 0 dl 1550012833 ref 2 fl Complete:H/0/0 rc 0/0 [495552.632591] Lustre: 56360:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages [495554.699165] Lustre: 12304:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550012830/real 1550012834] req@ffff8975eaa9dd00 x1624928968810496/t0(0) o104->fir-MDT0000@10.8.2.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1550012837 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [495554.726411] Lustre: 12304:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 30 previous similar messages [495554.884183] Lustre: fir-OST0014-osc-MDT0002: Connection to fir-OST0014 (at 10.0.10.103@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [495554.900338] Lustre: Skipped 16 previous similar messages [495555.081002] Lustre: mdt_out: This server is not able to keep up with request traffic (cpu-bound). [495555.089967] Lustre: Skipped 1 previous similar message [495555.095208] Lustre: 11472:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=2 reqQ=0 recA=0, svcEst=1, delay=0 [495555.105306] Lustre: 11472:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message [495555.170639] Lustre: 11472:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-10s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff895e8ae48450 x1624928968809648/t0(0) o1000->fir-MDT0000-mdtlov_UUID@0@lo:98/0 lens 304/0 e 0 to 0 dl 1550012833 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [495555.200057] Lustre: 11472:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 54 previous similar messages [495557.604095] LustreError: 14169:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff896c43325850 x1624759242111232/t0(0) o4->4f6b42bd-8393-db19-0238-71ebc8ff53fb@10.8.29.1@o2ib6:129/0 lens 488/448 e 1 to 0 dl 1550012864 ref 1 fl Interpret:/0/0 rc 0/0 [495557.628138] LustreError: 14169:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 33 previous similar messages [495557.637728] Lustre: fir-MDT0000: Bulk IO write error with 4f6b42bd-8393-db19-0238-71ebc8ff53fb (at 10.8.29.1@o2ib6), client will retry: rc = -110 [495557.650851] Lustre: Skipped 33 previous similar messages [495561.346748] Lustre: 56400:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:16s); client may timeout. req@ffff89849166f500 x1624700769938864/t0(0) o103->1cc72755-0b18-692a-013c-e5abb0ad9b59@10.9.106.44@o2ib4:98/0 lens 328/192 e 0 to 0 dl 1550012833 ref 2 fl Complete:H/0/0 rc 0/0 [495561.375212] Lustre: 56400:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages [495562.770474] LustreError: 14063:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk WRITE failed: rc -107 req@ffff8966162c8050 x1624672799453312/t0(0) o4->5ce4aa84-f2c6-c26d-932a-86a3d1404c49@10.9.102.20@o2ib4:133/0 lens 488/448 e 0 to 0 dl 1550012868 ref 1 fl Interpret:/2/0 rc 0/0 [495565.477864] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [ldlm_cn01_030:56388] [495565.537865] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [ldlm_cn00_022:56408] [495565.485953] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel [495565.537868] Modules linked in: [495565.537870] osp(OE) [495565.537870] mdd(OE) [495565.537871] mdt(OE) [495565.537871] lustre(OE) [495565.537872] mdc(OE) [495565.537873] lod(OE) [495565.537873] lfsck(OE) [495565.537874] mgs(OE) [495565.537874] mgc(OE) [495565.537875] osd_ldiskfs(OE) [495565.537875] lquota(OE) [495565.537876] ldiskfs(OE) [495565.537877] lmv(OE) [495565.537877] osc(OE) [495565.537878] lov(OE) [495565.537878] fid(OE) [495565.537879] fld(OE) [495565.537879] ko2iblnd(OE) [495565.537880] ptlrpc(OE) [495565.537880] obdclass(OE) [495565.537881] lnet(OE) [495565.537881] libcfs(OE) [495565.537882] rpcsec_gss_krb5 [495565.537882] auth_rpcgss [495565.537883] nfsv4 [495565.537884] dns_resolver [495565.537884] nfs [495565.537885] lockd [495565.537886] grace [495565.537886] fscache [495565.537887] rdma_ucm(OE) [495565.537887] ib_ucm(OE) [495565.537888] rdma_cm(OE) [495565.537889] iw_cm(OE) [495565.537890] ib_ipoib(OE) [495565.537891] ib_cm(OE) [495565.537891] ib_umad(OE) [495565.537892] mlx5_fpga_tools(OE) [495565.537893] mlx4_en(OE) [495565.537894] mlx4_ib(OE) [495565.537894] mlx4_core(OE) [495565.537895] dell_rbu [495565.537895] sunrpc [495565.537896] vfat [495565.537897] fat [495565.537897] dm_round_robin [495565.537898] dcdbas [495565.537898] amd64_edac_mod [495565.537899] edac_mce_amd [495565.537900] kvm_amd [495565.537900] kvm [495565.537901] irqbypass [495565.537901] crc32_pclmul [495565.537902] ghash_clmulni_intel [495565.537902] aesni_intel [495565.537903] lrw [495565.537903] gf128mul [495565.537904] glue_helper [495565.537904] ablk_helper [495565.537905] cryptd [495565.537905] ses [495565.537906] dm_multipath [495565.537906] ipmi_si [495565.537907] enclosure [495565.537907] pcspkr [495565.537908] dm_mod [495565.537908] sg [495565.537909] ipmi_devintf [495565.537909] ccp [495565.537910] i2c_piix4 [495565.537910] ipmi_msghandler [495565.537911] k10temp [495565.537911] acpi_power_meter [495565.537912] knem(OE) [495565.537912] ip_tables [495565.537913] ext4 [495565.537913] mbcache [495565.537914] jbd2 [495565.537914] sd_mod [495565.537915] crc_t10dif [495565.537915] crct10dif_generic [495565.537916] mlx5_ib(OE) [495565.537916] ib_uverbs(OE) [495565.537917] ib_core(OE) [495565.537917] i2c_algo_bit [495565.537918] drm_kms_helper [495565.537919] ahci [495565.537919] syscopyarea [495565.537920] mlx5_core(OE) [495565.537920] sysfillrect [495565.537921] sysimgblt [495565.537921] libahci [495565.537922] mlxfw(OE) [495565.537922] fb_sys_fops [495565.537922] devlink [495565.537923] ttm [495565.537923] crct10dif_pclmul [495565.537924] tg3 [495565.537924] crct10dif_common [495565.537924] mlx_compat(OE) [495565.537925] drm [495565.537926] megaraid_sas [495565.537926] crc32c_intel [495565.537926] libata [495565.537927] ptp [495565.537927] drm_panel_orientation_quirks [495565.537928] pps_core [495565.537928] mpt3sas(OE) [495565.537929] raid_class [495565.537929] scsi_transport_sas [495565.537930] [last unloaded: osp] [495565.537930] [495565.537934] CPU: 8 PID: 56408 Comm: ldlm_cn00_022 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495565.537934] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495565.537936] task: ffff8996ddab8000 ti: ffff8996ddab4000 task.ti: ffff8996ddab4000 [495565.537937] RIP: 0010:[] [495565.537943] [] native_queued_spin_lock_slowpath+0x122/0x200 [495565.537944] RSP: 0018:ffff8996ddab7bb8 EFLAGS: 00000246 [495565.537945] RAX: 0000000000000000 RBX: ffff8996ddab7bf0 RCX: 0000000000410000 [495565.537946] RDX: ffff8977ff81b780 RSI: 0000000001090001 RDI: ffff8987982e40dc [495565.537947] RBP: ffff8996ddab7bb8 R08: ffff8967fee9b780 R09: 0000000000000000 [495565.537947] R10: 0000000000000096 R11: 000000005c63519c R12: 0000000000000000 [495565.537948] R13: 0000000000000000 R14: 0000000000000002 R15: ffff89637b689b00 [495565.537949] FS: 00007fc48247a900(0000) GS:ffff8967fee80000(0000) knlGS:0000000000000000 [495565.537950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495565.537951] CR2: 00007fefd45bf1a0 CR3: 00000030389a6000 CR4: 00000000003407e0 [495565.537952] Call Trace: [495565.537958] [] queued_spin_lock_slowpath+0xb/0xf [495565.537962] [] _raw_spin_lock+0x20/0x30 [495565.538003] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495565.538031] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [495565.538040] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [495565.538046] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [495565.538072] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [495565.538104] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495565.538134] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495565.538163] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495565.538200] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495565.538234] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495565.538239] [] ? wake_up_state+0x20/0x20 [495565.538272] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495565.538307] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495565.538311] [] kthread+0xd1/0xe0 [495565.538312] [] ? insert_kthread_work+0x40/0x40 [495565.538315] [] ret_from_fork_nospec_begin+0xe/0x21 [495565.538316] [] ? insert_kthread_work+0x40/0x40 [495565.538317] Code: [495565.538317] 13 [495565.538318] 48 [495565.538318] c1 [495565.538319] ea [495565.538319] 0d [495565.538319] 48 [495565.538320] 98 [495565.538320] 83 [495565.538320] e2 [495565.538320] 30 [495565.538321] 48 [495565.538321] 81 [495565.538321] c2 [495565.538322] 80 [495565.538322] b7 [495565.538322] 01 [495565.538323] 00 [495565.538323] 48 [495565.538323] 03 [495565.538324] 14 [495565.538324] c5 [495565.538324] 60 [495565.538325] b9 [495565.538325] 34 [495565.538325] 9e [495565.538325] 4c [495565.538326] 89 [495565.538326] 02 [495565.538326] 41 [495565.538327] 8b [495565.538327] 40 [495565.538327] 08 [495565.538328] 85 [495565.538328] c0 [495565.538328] 75 [495565.538329] 0f [495565.538329] 0f [495565.538329] 1f [495565.538329] 44 [495565.538330] 00 [495565.538330] 00 [495565.538330] f3 [495565.538331] 90 [495565.538331] <41> [495565.538331] 8b [495565.538332] 40 [495565.538332] 08 [495565.538332] 85 [495565.538333] c0 [495565.538333] 74 [495565.538333] f6 [495565.538334] 4d [495565.538334] 8b [495565.538334] 08 [495565.538335] 4d [495565.538335] 85 [495565.538335] c9 [495565.538335] 74 [495565.538336] 04 [495565.538336] 41 [495565.538336] 0f [495565.538337] 18 [495565.538337] 09 [495565.538337] 8b [495565.538338] [495565.761871] NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s! [ldlm_cn02_006:54102] [495565.761872] Modules linked in: [495565.761874] osp(OE) [495565.761875] mdd(OE) [495565.761876] mdt(OE) [495565.761877] lustre(OE) [495565.761877] mdc(OE) [495565.761878] lod(OE) [495565.761879] lfsck(OE) [495565.761880] mgs(OE) [495565.761881] mgc(OE) [495565.761881] osd_ldiskfs(OE) [495565.761882] lquota(OE) [495565.761883] ldiskfs(OE) [495565.761884] lmv(OE) [495565.761884] osc(OE) [495565.761885] lov(OE) [495565.761885] fid(OE) [495565.761886] fld(OE) [495565.761887] ko2iblnd(OE) [495565.761887] ptlrpc(OE) [495565.761888] obdclass(OE) [495565.761888] lnet(OE) [495565.761889] libcfs(OE) [495565.761890] rpcsec_gss_krb5 [495565.761890] auth_rpcgss [495565.761891] nfsv4 [495565.761892] dns_resolver [495565.761892] nfs [495565.761893] lockd [495565.761894] grace [495565.761894] fscache [495565.761895] rdma_ucm(OE) [495565.761896] ib_ucm(OE) [495565.761897] rdma_cm(OE) [495565.761897] iw_cm(OE) [495565.761898] ib_ipoib(OE) [495565.761899] ib_cm(OE) [495565.761899] ib_umad(OE) [495565.761900] mlx5_fpga_tools(OE) [495565.761901] mlx4_en(OE) [495565.761902] mlx4_ib(OE) [495565.761902] mlx4_core(OE) [495565.761903] dell_rbu [495565.761904] sunrpc [495565.761904] vfat [495565.761905] fat [495565.761905] dm_round_robin [495565.761906] dcdbas [495565.761907] amd64_edac_mod [495565.761907] edac_mce_amd [495565.761908] kvm_amd [495565.761909] kvm [495565.761909] irqbypass [495565.761910] crc32_pclmul [495565.761910] ghash_clmulni_intel [495565.761911] aesni_intel [495565.761912] lrw [495565.761912] gf128mul [495565.761913] glue_helper [495565.761914] ablk_helper [495565.761914] cryptd [495565.761915] ses [495565.761916] dm_multipath [495565.761916] ipmi_si [495565.761917] enclosure [495565.761918] pcspkr [495565.761919] dm_mod [495565.761919] sg [495565.761920] ipmi_devintf [495565.761920] ccp [495565.761921] i2c_piix4 [495565.761922] ipmi_msghandler [495565.761922] k10temp [495565.761923] acpi_power_meter [495565.761924] knem(OE) [495565.761924] ip_tables [495565.761925] ext4 [495565.761926] mbcache [495565.761926] jbd2 [495565.761927] sd_mod [495565.761927] crc_t10dif [495565.761928] crct10dif_generic [495565.761929] mlx5_ib(OE) [495565.761929] ib_uverbs(OE) [495565.761930] ib_core(OE) [495565.761931] i2c_algo_bit [495565.761931] drm_kms_helper [495565.761932] ahci [495565.761933] syscopyarea [495565.761933] mlx5_core(OE) [495565.761934] sysfillrect [495565.761935] sysimgblt [495565.761935] libahci [495565.761936] mlxfw(OE) [495565.761936] fb_sys_fops [495565.761937] devlink [495565.761937] ttm [495565.761938] crct10dif_pclmul [495565.761939] tg3 [495565.761939] crct10dif_common [495565.761940] mlx_compat(OE) [495565.761940] drm [495565.761941] megaraid_sas [495565.761942] crc32c_intel [495565.761942] libata [495565.761943] ptp [495565.761943] drm_panel_orientation_quirks [495565.761944] pps_core [495565.761945] mpt3sas(OE) [495565.761945] raid_class [495565.761946] scsi_transport_sas [495565.761946] [last unloaded: osp] [495565.761947] [495565.761951] CPU: 34 PID: 54102 Comm: ldlm_cn02_006 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495565.761952] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495565.761953] task: ffff89581aafa080 ti: ffff8997e6e9c000 task.ti: ffff8997e6e9c000 [495565.761955] RIP: 0010:[] [495565.761961] [] native_queued_spin_lock_slowpath+0x122/0x200 [495565.761962] RSP: 0018:ffff8997e6e9fc38 EFLAGS: 00000246 [495565.761963] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000001110000 [495565.761964] RDX: ffff8967feedb780 RSI: 0000000000610001 RDI: ffff8987982e40dc [495565.761965] RBP: ffff8997e6e9fc38 R08: ffff8987ff81b780 R09: 0000000000000000 [495565.761966] R10: ffff89593fc07600 R11: 0000000000000000 R12: ffff8997e6e9fbe0 [495565.761966] R13: ffff8986346fafc0 R14: ffff8997e6e9fba0 R15: ffffffffc0c6a378 [495565.761968] FS: 00007fb5666a8700(0000) GS:ffff8987ff800000(0000) knlGS:0000000000000000 [495565.761969] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495565.761970] CR2: 00007fb56a20c000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [495565.761971] Call Trace: [495565.761977] [] queued_spin_lock_slowpath+0xb/0xf [495565.761981] [] _raw_spin_lock+0x20/0x30 [495565.762029] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495565.762063] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [495565.762101] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495565.762138] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495565.762174] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495565.762218] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495565.762258] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495565.762265] [] ? wake_up_state+0x20/0x20 [495565.762305] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495565.762344] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495565.762348] [] kthread+0xd1/0xe0 [495565.762350] [] ? insert_kthread_work+0x40/0x40 [495565.762353] [] ret_from_fork_nospec_begin+0xe/0x21 [495565.762355] [] ? insert_kthread_work+0x40/0x40 [495565.762356] Code: [495565.762356] 13 [495565.762357] 48 [495565.762357] c1 [495565.762358] ea [495565.762358] 0d [495565.762358] 48 [495565.762359] 98 [495565.762359] 83 [495565.762359] e2 [495565.762360] 30 [495565.762360] 48 [495565.762360] 81 [495565.762361] c2 [495565.762361] 80 [495565.762361] b7 [495565.762362] 01 [495565.762362] 00 [495565.762363] 48 [495565.762363] 03 [495565.762363] 14 [495565.762364] c5 [495565.762364] 60 [495565.762364] b9 [495565.762365] 34 [495565.762365] 9e [495565.762365] 4c [495565.762366] 89 [495565.762366] 02 [495565.762366] 41 [495565.762367] 8b [495565.762367] 40 [495565.762367] 08 [495565.762368] 85 [495565.762368] c0 [495565.762368] 75 [495565.762369] 0f [495565.762369] 0f [495565.762369] 1f [495565.762370] 44 [495565.762370] 00 [495565.762370] 00 [495565.762371] f3 [495565.762371] 90 [495565.762372] <41> [495565.762372] 8b [495565.762372] 40 [495565.762373] 08 [495565.762373] 85 [495565.762373] c0 [495565.762374] 74 [495565.762374] f6 [495565.762374] 4d [495565.762375] 8b [495565.762375] 08 [495565.762375] 4d [495565.762376] 85 [495565.762376] c9 [495565.762376] 74 [495565.762377] 04 [495565.762377] 41 [495565.762377] 0f [495565.762378] 18 [495565.762378] 09 [495565.762378] 8b [495565.762379] [495565.822871] NMI watchdog: BUG: soft lockup - CPU#41 stuck for 22s! [ldlm_cn01_019:56377] [495565.822872] Modules linked in: [495565.822873] osp(OE) [495565.822873] mdd(OE) [495565.822874] mdt(OE) [495565.822875] lustre(OE) [495565.822875] mdc(OE) [495565.822876] lod(OE) [495565.822876] lfsck(OE) [495565.822877] mgs(OE) [495565.822877] mgc(OE) [495565.822878] osd_ldiskfs(OE) [495565.822878] lquota(OE) [495565.822879] ldiskfs(OE) [495565.822879] lmv(OE) [495565.822880] osc(OE) [495565.822880] lov(OE) [495565.822881] fid(OE) [495565.822881] fld(OE) [495565.822882] ko2iblnd(OE) [495565.822882] ptlrpc(OE) [495565.822883] obdclass(OE) [495565.822883] lnet(OE) [495565.822884] libcfs(OE) [495565.822884] rpcsec_gss_krb5 [495565.822885] auth_rpcgss [495565.822885] nfsv4 [495565.822886] dns_resolver [495565.822886] nfs [495565.822886] lockd [495565.822887] grace [495565.822887] fscache [495565.822888] rdma_ucm(OE) [495565.822888] ib_ucm(OE) [495565.822889] rdma_cm(OE) [495565.822889] iw_cm(OE) [495565.822890] ib_ipoib(OE) [495565.822890] ib_cm(OE) [495565.822891] ib_umad(OE) [495565.822891] mlx5_fpga_tools(OE) [495565.822892] mlx4_en(OE) [495565.822892] mlx4_ib(OE) [495565.822893] mlx4_core(OE) [495565.822893] dell_rbu [495565.822894] sunrpc [495565.822894] vfat [495565.822894] fat [495565.822895] dm_round_robin [495565.822895] dcdbas [495565.822896] amd64_edac_mod [495565.822896] edac_mce_amd [495565.822896] kvm_amd [495565.822897] kvm [495565.822897] irqbypass [495565.822897] crc32_pclmul [495565.822898] ghash_clmulni_intel [495565.822898] aesni_intel [495565.822899] lrw [495565.822899] gf128mul [495565.822900] glue_helper [495565.822901] ablk_helper [495565.822901] cryptd [495565.822902] ses [495565.822902] dm_multipath [495565.822903] ipmi_si [495565.822904] enclosure [495565.822904] pcspkr [495565.822905] dm_mod [495565.822906] sg [495565.822906] ipmi_devintf [495565.822906] ccp [495565.822907] i2c_piix4 [495565.822908] ipmi_msghandler [495565.822908] k10temp [495565.822909] acpi_power_meter [495565.822910] knem(OE) [495565.822910] ip_tables [495565.822911] ext4 [495565.822912] mbcache [495565.822912] jbd2 [495565.822913] sd_mod [495565.822913] crc_t10dif [495565.822913] crct10dif_generic [495565.822914] mlx5_ib(OE) [495565.822915] ib_uverbs(OE) [495565.822916] ib_core(OE) [495565.822916] i2c_algo_bit [495565.822917] drm_kms_helper [495565.822917] ahci [495565.822918] syscopyarea [495565.822919] mlx5_core(OE) [495565.822919] sysfillrect [495565.822920] sysimgblt [495565.822920] libahci [495565.822921] mlxfw(OE) [495565.822922] fb_sys_fops [495565.822922] devlink [495565.822922] ttm [495565.822923] crct10dif_pclmul [495565.822923] tg3 [495565.822924] crct10dif_common [495565.822924] mlx_compat(OE) [495565.822925] drm [495565.822926] megaraid_sas [495565.822926] crc32c_intel [495565.822926] libata [495565.822927] ptp [495565.822928] drm_panel_orientation_quirks [495565.822928] pps_core [495565.822929] mpt3sas(OE) [495565.822930] raid_class [495565.822930] scsi_transport_sas [495565.822931] [last unloaded: osp] [495565.822931] [495565.822935] CPU: 41 PID: 56377 Comm: ldlm_cn01_019 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495565.822936] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495565.822937] task: ffff8996a0206180 ti: ffff8997deaa0000 task.ti: ffff8997deaa0000 [495565.822938] RIP: 0010:[] [495565.822946] [] native_queued_spin_lock_slowpath+0x122/0x200 [495565.822947] RSP: 0018:ffff8997deaa3c38 EFLAGS: 00000246 [495565.822948] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000001490000 [495565.822949] RDX: ffff89983f45b780 RSI: 0000000000390001 RDI: ffff8987982e40dc [495565.822950] RBP: ffff8997deaa3c38 R08: ffff8977ff89b780 R09: 0000000000000000 [495565.822951] R10: ffff89593fc07600 R11: ffff896d4da88840 R12: ffff8997deaa3be0 [495565.822951] R13: ffff8986346fafc0 R14: ffff8997deaa3ba0 R15: ffffffffc0c6a378 [495565.822953] FS: 00007fefd5037840(0000) GS:ffff8977ff880000(0000) knlGS:0000000000000000 [495565.822954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495565.822955] CR2: 000056169deab300 CR3: 0000003dc7410000 CR4: 00000000003407e0 [495565.822956] Call Trace: [495565.822962] [] queued_spin_lock_slowpath+0xb/0xf [495565.822966] [] _raw_spin_lock+0x20/0x30 [495565.823003] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495565.823036] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [495565.823073] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495565.823109] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495565.823144] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495565.823188] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495565.823227] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495565.823233] [] ? default_wake_function+0x12/0x20 [495565.823236] [] ? __wake_up_common+0x5b/0x90 [495565.823275] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495565.823313] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495565.823318] [] kthread+0xd1/0xe0 [495565.823320] [] ? insert_kthread_work+0x40/0x40 [495565.823323] [] ret_from_fork_nospec_begin+0xe/0x21 [495565.823325] [] ? insert_kthread_work+0x40/0x40 [495565.823325] Code: [495565.823326] 13 [495565.823327] 48 [495565.823327] c1 [495565.823327] ea [495565.823328] 0d [495565.823328] 48 [495565.823328] 98 [495565.823329] 83 [495565.823329] e2 [495565.823329] 30 [495565.823330] 48 [495565.823330] 81 [495565.823330] c2 [495565.823331] 80 [495565.823331] b7 [495565.823331] 01 [495565.823332] 00 [495565.823332] 48 [495565.823333] 03 [495565.823333] 14 [495565.823333] c5 [495565.823334] 60 [495565.823334] b9 [495565.823334] 34 [495565.823335] 9e [495565.823335] 4c [495565.823335] 89 [495565.823336] 02 [495565.823336] 41 [495565.823336] 8b [495565.823337] 40 [495565.823337] 08 [495565.823337] 85 [495565.823338] c0 [495565.823338] 75 [495565.823338] 0f [495565.823339] 0f [495565.823339] 1f [495565.823339] 44 [495565.823340] 00 [495565.823340] 00 [495565.823340] f3 [495565.823341] 90 [495565.823341] <41> [495565.823342] 8b [495565.823342] 40 [495565.823342] 08 [495565.823343] 85 [495565.823343] c0 [495565.823343] 74 [495565.823344] f6 [495565.823344] 4d [495565.823344] 8b [495565.823345] 08 [495565.823345] 4d [495565.823345] 85 [495565.823346] c9 [495565.823346] 74 [495565.823346] 04 [495565.823347] 41 [495565.823347] 0f [495565.823347] 18 [495565.823348] 09 [495565.823348] 8b [495565.823348] [495565.866877] NMI watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [ldlm_cn02_007:56230] [495565.866878] Modules linked in: [495565.866880] osp(OE) [495565.866881] mdd(OE) [495565.866881] mdt(OE) [495565.866882] lustre(OE) [495565.866883] mdc(OE) [495565.866884] lod(OE) [495565.866885] lfsck(OE) [495565.866886] mgs(OE) [495565.866886] mgc(OE) [495565.866887] osd_ldiskfs(OE) [495565.866888] lquota(OE) [495565.866889] ldiskfs(OE) [495565.866890] lmv(OE) [495565.866890] osc(OE) [495565.866891] lov(OE) [495565.866891] fid(OE) [495565.866892] fld(OE) [495565.866892] ko2iblnd(OE) [495565.866893] ptlrpc(OE) [495565.866894] obdclass(OE) [495565.866894] lnet(OE) [495565.866895] libcfs(OE) [495565.866896] rpcsec_gss_krb5 [495565.866896] auth_rpcgss [495565.866897] nfsv4 [495565.866898] dns_resolver [495565.866898] nfs [495565.866899] lockd [495565.866900] grace [495565.866900] fscache [495565.866901] rdma_ucm(OE) [495565.866902] ib_ucm(OE) [495565.866903] rdma_cm(OE) [495565.866903] iw_cm(OE) [495565.866904] ib_ipoib(OE) [495565.866905] ib_cm(OE) [495565.866905] ib_umad(OE) [495565.866906] mlx5_fpga_tools(OE) [495565.866907] mlx4_en(OE) [495565.866908] mlx4_ib(OE) [495565.866909] mlx4_core(OE) [495565.866909] dell_rbu [495565.866910] sunrpc [495565.866911] vfat [495565.866911] fat [495565.866912] dm_round_robin [495565.866913] dcdbas [495565.866913] amd64_edac_mod [495565.866914] edac_mce_amd [495565.866915] kvm_amd [495565.866915] kvm [495565.866916] irqbypass [495565.866916] crc32_pclmul [495565.866917] ghash_clmulni_intel [495565.866917] aesni_intel [495565.866918] lrw [495565.866918] gf128mul [495565.866919] glue_helper [495565.866920] ablk_helper [495565.866920] cryptd [495565.866921] ses [495565.866921] dm_multipath [495565.866922] ipmi_si [495565.866922] enclosure [495565.866923] pcspkr [495565.866923] dm_mod [495565.866924] sg [495565.866924] ipmi_devintf [495565.866925] ccp [495565.866926] i2c_piix4 [495565.866926] ipmi_msghandler [495565.866927] k10temp [495565.866927] acpi_power_meter [495565.866928] knem(OE) [495565.866929] ip_tables [495565.866929] ext4 [495565.866930] mbcache [495565.866930] jbd2 [495565.866931] sd_mod [495565.866931] crc_t10dif [495565.866932] crct10dif_generic [495565.866933] mlx5_ib(OE) [495565.866933] ib_uverbs(OE) [495565.866934] ib_core(OE) [495565.866934] i2c_algo_bit [495565.866935] drm_kms_helper [495565.866936] ahci [495565.866936] syscopyarea [495565.866937] mlx5_core(OE) [495565.866938] sysfillrect [495565.866938] sysimgblt [495565.866939] libahci [495565.866939] mlxfw(OE) [495565.866940] fb_sys_fops [495565.866940] devlink [495565.866941] ttm [495565.866941] crct10dif_pclmul [495565.866942] tg3 [495565.866942] crct10dif_common [495565.866943] mlx_compat(OE) [495565.866944] drm [495565.866944] megaraid_sas [495565.866945] crc32c_intel [495565.866945] libata [495565.866946] ptp [495565.866946] drm_panel_orientation_quirks [495565.866947] pps_core [495565.866948] mpt3sas(OE) [495565.866948] raid_class [495565.866949] scsi_transport_sas [495565.866949] [last unloaded: osp] [495565.866950] [495565.866954] CPU: 46 PID: 56230 Comm: ldlm_cn02_007 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495565.866955] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495565.866956] task: ffff89960caf6180 ti: ffff899587bb8000 task.ti: ffff899587bb8000 [495565.866957] RIP: 0010:[] [495565.866965] [] native_queued_spin_lock_slowpath+0x122/0x200 [495565.866966] RSP: 0018:ffff899587bbbbb8 EFLAGS: 00000246 [495565.866967] RAX: 0000000000000000 RBX: ffff899587bbbbf0 RCX: 0000000001710000 [495565.866968] RDX: ffff8987ff71b780 RSI: 0000000000910000 RDI: ffff8987982e40dc [495565.866968] RBP: ffff899587bbbbb8 R08: ffff8987ff8db780 R09: 0000000000000000 [495565.866969] R10: ffff89593fc07600 R11: 0000000000000000 R12: 0000000000000000 [495565.866970] R13: 0000000000000000 R14: 0000000000000002 R15: ffff898367aad580 [495565.866971] FS: 00007f7f46277740(0000) GS:ffff8987ff8c0000(0000) knlGS:0000000000000000 [495565.866972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495565.866973] CR2: 00007f7f45d611e5 CR3: 0000003dc7410000 CR4: 00000000003407e0 [495565.866975] Call Trace: [495565.866981] [] queued_spin_lock_slowpath+0xb/0xf [495565.866985] [] _raw_spin_lock+0x20/0x30 [495565.867028] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495565.867062] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [495565.867065] [] ? native_queued_spin_lock_slowpath+0x158/0x200 [495565.867097] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [495565.867135] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495565.867171] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495565.867206] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495565.867251] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495565.867290] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495565.867296] [] ? wake_up_state+0x20/0x20 [495565.867335] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495565.867375] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495565.867379] [] kthread+0xd1/0xe0 [495565.867381] [] ? insert_kthread_work+0x40/0x40 [495565.867384] [] ret_from_fork_nospec_begin+0xe/0x21 [495565.867386] [] ? insert_kthread_work+0x40/0x40 [495565.867386] Code: [495565.867387] 13 [495565.867388] 48 [495565.867388] c1 [495565.867388] ea [495565.867389] 0d [495565.867389] 48 [495565.867390] 98 [495565.867390] 83 [495565.867390] e2 [495565.867391] 30 [495565.867391] 48 [495565.867391] 81 [495565.867392] c2 [495565.867392] 80 [495565.867392] b7 [495565.867393] 01 [495565.867393] 00 [495565.867393] 48 [495565.867394] 03 [495565.867394] 14 [495565.867394] c5 [495565.867395] 60 [495565.867395] b9 [495565.867395] 34 [495565.867396] 9e [495565.867396] 4c [495565.867396] 89 [495565.867397] 02 [495565.867397] 41 [495565.867397] 8b [495565.867398] 40 [495565.867398] 08 [495565.867398] 85 [495565.867399] c0 [495565.867399] 75 [495565.867399] 0f [495565.867400] 0f [495565.867400] 1f [495565.867400] 44 [495565.867401] 00 [495565.867401] 00 [495565.867401] f3 [495565.867402] 90 [495565.867402] <41> [495565.867403] 8b [495565.867403] 40 [495565.867403] 08 [495565.867404] 85 [495565.867404] c0 [495565.867404] 74 [495565.867405] f6 [495565.867405] 4d [495565.867405] 8b [495565.867406] 08 [495565.867406] 4d [495565.867406] 85 [495565.867407] c9 [495565.867407] 74 [495565.867408] 04 [495565.867408] 41 [495565.867408] 0f [495565.867409] 18 [495565.867409] 09 [495565.867409] 8b [495565.867409] [495566.949201] aesni_intel [495566.951854] lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495567.004906] CPU: 1 PID: 56388 Comm: ldlm_cn01_030 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495567.017496] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495567.025149] task: ffff89976bf930c0 ti: ffff8997bc750000 task.ti: ffff8997bc750000 [495567.032716] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [495567.042735] RSP: 0018:ffff8997bc753c38 EFLAGS: 00000246 [495567.048133] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000090000 [495567.055352] RDX: ffff8967ff09b780 RSI: 0000000001410001 RDI: ffff8987982e40dc [495567.062571] RBP: ffff8997bc753c38 R08: ffff8977ff61b780 R09: 0000000000000000 [495567.069793] R10: ffff89593fc07600 R11: 0000000000000400 R12: ffff8997bc753be0 [495567.077013] R13: ffff8986346fafc0 R14: ffff8997bc753ba0 R15: ffffffffc0c6a378 [495567.084231] FS: 00007f7674330700(0000) GS:ffff8977ff600000(0000) knlGS:0000000000000000 [495567.092405] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495567.098236] CR2: 00007f76743a2000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [495567.105457] Call Trace: [495567.107999] [] queued_spin_lock_slowpath+0xb/0xf [495567.114352] [] _raw_spin_lock+0x20/0x30 [495567.119951] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495567.126593] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [495567.133242] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495567.140237] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495567.147058] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495567.154056] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495567.161829] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495567.168699] [] ? default_wake_function+0x12/0x20 [495567.175053] [] ? __wake_up_common+0x5b/0x90 [495567.181010] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495567.187397] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495567.194873] [] kthread+0xd1/0xe0 [495567.199841] [] ? insert_kthread_work+0x40/0x40 [495567.206020] [] ret_from_fork_nospec_begin+0xe/0x21 [495567.212544] [] ? insert_kthread_work+0x40/0x40 [495567.218724] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [495569.139531] LustreError: 14077:0:(ldlm_lib.c:3207:target_bulk_io()) @@@ bulk WRITE failed: rc -107 req@ffff8997706a7050 x1624672799453248/t0(0) o4->5ce4aa84-f2c6-c26d-932a-86a3d1404c49@10.9.102.20@o2ib4:133/0 lens 488/448 e 0 to 0 dl 1550012868 ref 1 fl Interpret:/2/0 rc 0/0 [495570.283062] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 19 seconds [495570.293240] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [495570.303506] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.203@o2ib7 (25): c: 0, oc: 0, rc: 8 [495570.315667] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 3 previous similar messages [495570.325547] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.211@o2ib7: 26 seconds [495570.335897] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 192 previous similar messages [495572.940216] Lustre: 12222:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550012837/real 1550012838] req@ffff8973a333c800 x1624928968810080/t0(0) o106->fir-MDT0002@10.9.107.46@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1550012844 ref 1 fl Rpc:RX/2/ffffffff rc 0/-1 [495572.967726] Lustre: 12222:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 31 previous similar messages [495573.620785] Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). [495573.630183] Lustre: 8532:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=30, svcEst=58, delay=10 [495573.640448] Lustre: 8532:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8997b42d9800 x1624700769948176/t0(0) o103->1cc72755-0b18-692a-013c-e5abb0ad9b59@10.9.106.44@o2ib4:123/0 lens 328/0 e 0 to 0 dl 1550012858 ref 2 fl New:H/0/ffffffff rc 0/-1 [495573.692069] NMI watchdog: BUG: soft lockup - CPU#26 stuck for 22s! [ldlm_cn02_017:56373] [495573.700245] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [495573.773244] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495573.806668] CPU: 26 PID: 56373 Comm: ldlm_cn02_017 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495573.819344] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495573.826998] task: ffff8996a0202080 ti: ffff899730f50000 task.ti: ffff899730f50000 [495573.834564] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [495573.844592] RSP: 0018:ffff899730f53bb8 EFLAGS: 00000246 [495573.849992] RAX: 0000000000000000 RBX: ffff899730f53bf0 RCX: 0000000000d10000 [495573.857212] RDX: ffff8967ff05b780 RSI: 0000000001210001 RDI: ffff8987982e40dc [495573.864430] RBP: ffff899730f53bb8 R08: ffff8987ff79b780 R09: 0000000000000000 [495573.871651] R10: ffff89593fc07600 R11: ffff899730f538ee R12: 0000000000000000 [495573.878870] R13: 0000000000000000 R14: 0000000000000002 R15: ffff896073727080 [495573.886090] FS: 00007f37fbb43700(0000) GS:ffff8987ff780000(0000) knlGS:0000000000000000 [495573.894261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495573.900094] CR2: 00007f3c4aea9000 CR3: 000000203b27e000 CR4: 00000000003407e0 [495573.907315] Call Trace: [495573.909859] [] queued_spin_lock_slowpath+0xb/0xf [495573.916209] [] _raw_spin_lock+0x20/0x30 [495573.921821] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495573.928469] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [495573.935432] [] ? native_queued_spin_lock_slowpath+0x156/0x200 [495573.942941] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [495573.949594] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495573.956586] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495573.963407] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495573.970407] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495573.978187] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495573.985059] [] ? default_wake_function+0x12/0x20 [495573.991411] [] ? __wake_up_common+0x5b/0x90 [495573.997368] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495574.003755] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495574.011233] [] kthread+0xd1/0xe0 [495574.016197] [] ? insert_kthread_work+0x40/0x40 [495574.022378] [] ret_from_fork_nospec_begin+0xe/0x21 [495574.028903] [] ? insert_kthread_work+0x40/0x40 [495574.035081] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [495576.364938] Lustre: fir-OST001a-osc-MDT0002: Connection to fir-OST001a (at 10.0.10.105@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [495576.381090] Lustre: Skipped 14 previous similar messages [495581.593703] Lustre: 14052:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff899767b40850 x1624673075081968/t0(0) o4->91d645ed-86a8-bf9b-39ba-6e32b02e94c5@10.9.102.18@o2ib4:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 [495581.622270] Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). [495581.631141] Lustre: 12594:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=42 reqQ=0 recA=22, svcEst=59, delay=7011 [495585.969537] Lustre: 11495:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (33:1s); client may timeout. req@ffff8994def10050 x1624932162134688/t34371873637(0) o4->9041b005-ca79-5425-d710-65376539b634@10.9.107.59@o2ib4:138/0 lens 6680/416 e 1 to 0 dl 1550012873 ref 2 fl Complete:/0/0 rc 0/0 [495585.998779] Lustre: 11495:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message [495588.543662] Lustre: 11491:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 8s req@ffff896db1f47050 x1624747982474368/t0(0) o4->ca76035c-fba3-91d6-92f5-1fd97493e3e8@10.9.107.4@o2ib4:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 [495588.567450] Lustre: 11491:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 2 previous similar messages [495589.683175] LustreError: 14153:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 0+4s req@ffff899778ba1850 x1624748397770560/t0(0) o4->50f28a0e-eb03-3ed4-df6b-96db06d3f42b@10.9.107.34@o2ib4:138/0 lens 488/448 e 1 to 0 dl 1550012873 ref 2 fl Interpret:/0/0 rc 0/0 [495589.778962] LustreError: 12375:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.107.43@o2ib4: deadline 33:4s ago req@ffff899778ba1450 x1624732783758928/t0(0) o4->4e48592f-b97d-5c93-9da4-86c872d7a486@10.9.107.43@o2ib4:138/0 lens 9320/0 e 1 to 0 dl 1550012873 ref 2 fl Interpret:/0/ffffffff rc 0/-1 [495589.810974] LustreError: 12375:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 86 previous similar messages [495589.873264] Lustre: 12594:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8997816c4450 x1624758366245840/t0(0) o4->2196625d-9992-1b8a-5a12-40751a9cdd4e@10.9.107.2@o2ib4:139/0 lens 488/0 e 1 to 0 dl 1550012874 ref 2 fl New:/0/ffffffff rc 0/-1 [495590.065794] Lustre: fir-MDT0000: Bulk IO read error with fe789eb3-1cd9-3594-b889-6606ba1b8e4a (at 10.9.113.2@o2ib4), client will retry: rc -110 [495590.078737] Lustre: Skipped 2 previous similar messages [495590.640492] Lustre: 12252:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 10s req@ffff896db1f46c50 x1624747982474528/t0(0) o4->ca76035c-fba3-91d6-92f5-1fd97493e3e8@10.9.107.4@o2ib4:0/0 lens 488/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1 [495591.879590] Lustre: mgs: This server is not able to keep up with request traffic (cpu-bound). [495591.888202] Lustre: 8645:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=25 reqQ=27 recA=21, svcEst=34, delay=9091 [495591.898817] Lustre: 8645:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-3s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff89942ae8dc50 x1624673096815008/t0(0) o400->978d15e1-4a6c-0fd0-304b-6f2d68182134@10.8.2.21@o2ib6:141/0 lens 224/0 e 0 to 0 dl 1550012876 ref 2 fl New:/0/ffffffff rc 0/-1 [495591.929618] Lustre: 8645:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 40 previous similar messages [495592.925295] LustreError: 12276:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff896c43324050 x1624712466046800/t0(0) o4->c34ae119-43a8-1066-416d-873c10713275@10.8.6.4@o2ib6:151/0 lens 488/448 e 2 to 0 dl 1550012886 ref 1 fl Interpret:/0/0 rc 0/0 [495592.949254] LustreError: 12276:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 14 previous similar messages [495592.958856] Lustre: fir-MDT0002: Bulk IO write error with c34ae119-43a8-1066-416d-873c10713275 (at 10.8.6.4@o2ib6), client will retry: rc = -110 [495592.971888] Lustre: Skipped 15 previous similar messages [495593.762572] NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s! [ldlm_cn02_006:54102] [495593.770747] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [495593.867575] NMI watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [ldlm_cn02_007:56230] [495593.843749] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata [495593.867577] Modules linked in: [495593.867579] osp(OE) [495593.867580] mdd(OE) [495593.867580] mdt(OE) [495593.867581] lustre(OE) [495593.867582] mdc(OE) [495593.867582] lod(OE) [495593.867583] lfsck(OE) [495593.867584] mgs(OE) [495593.867585] mgc(OE) [495593.867585] osd_ldiskfs(OE) [495593.867586] lquota(OE) [495593.867587] ldiskfs(OE) [495593.867588] lmv(OE) [495593.867588] osc(OE) [495593.867589] lov(OE) [495593.867589] fid(OE) [495593.867590] fld(OE) [495593.867591] ko2iblnd(OE) [495593.867591] ptlrpc(OE) [495593.867592] obdclass(OE) [495593.867592] lnet(OE) [495593.867593] libcfs(OE) [495593.867594] rpcsec_gss_krb5 [495593.867594] auth_rpcgss [495593.867595] nfsv4 [495593.867595] dns_resolver [495593.867596] nfs [495593.867596] lockd [495593.867597] grace [495593.867597] fscache [495593.867598] rdma_ucm(OE) [495593.867598] ib_ucm(OE) [495593.867599] rdma_cm(OE) [495593.867600] iw_cm(OE) [495593.867600] ib_ipoib(OE) [495593.867601] ib_cm(OE) [495593.867602] ib_umad(OE) [495593.867602] mlx5_fpga_tools(OE) [495593.867603] mlx4_en(OE) [495593.867603] mlx4_ib(OE) [495593.867604] mlx4_core(OE) [495593.867605] dell_rbu [495593.867605] sunrpc [495593.867606] vfat [495593.867606] fat [495593.867607] dm_round_robin [495593.867608] dcdbas [495593.867608] amd64_edac_mod [495593.867609] edac_mce_amd [495593.867610] kvm_amd [495593.867610] kvm [495593.867611] irqbypass [495593.867611] crc32_pclmul [495593.867612] ghash_clmulni_intel [495593.867612] aesni_intel [495593.867613] lrw [495593.867613] gf128mul [495593.867614] glue_helper [495593.867614] ablk_helper [495593.867615] cryptd [495593.867615] ses [495593.867616] dm_multipath [495593.867617] ipmi_si [495593.867617] enclosure [495593.867618] pcspkr [495593.867618] dm_mod [495593.867619] sg [495593.867619] ipmi_devintf [495593.867620] ccp [495593.867620] i2c_piix4 [495593.867621] ipmi_msghandler [495593.867622] k10temp [495593.867622] acpi_power_meter [495593.867623] knem(OE) [495593.867624] ip_tables [495593.867624] ext4 [495593.867625] mbcache [495593.867625] jbd2 [495593.867626] sd_mod [495593.867627] crc_t10dif [495593.867627] crct10dif_generic [495593.867628] mlx5_ib(OE) [495593.867628] ib_uverbs(OE) [495593.867629] ib_core(OE) [495593.867630] i2c_algo_bit [495593.867630] drm_kms_helper [495593.867631] ahci [495593.867631] syscopyarea [495593.867632] mlx5_core(OE) [495593.867633] sysfillrect [495593.867633] sysimgblt [495593.867634] libahci [495593.867634] mlxfw(OE) [495593.867635] fb_sys_fops [495593.867635] devlink [495593.867636] ttm [495593.867636] crct10dif_pclmul [495593.867637] tg3 [495593.867637] crct10dif_common [495593.867638] mlx_compat(OE) [495593.867639] drm [495593.867639] megaraid_sas [495593.867640] crc32c_intel [495593.867640] libata [495593.867641] ptp [495593.867641] drm_panel_orientation_quirks [495593.867642] pps_core [495593.867643] mpt3sas(OE) [495593.867643] raid_class [495593.867644] scsi_transport_sas [495593.867644] [last unloaded: osp] [495593.867645] [495593.867648] CPU: 46 PID: 56230 Comm: ldlm_cn02_007 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495593.867649] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495593.867650] task: ffff89960caf6180 ti: ffff899587bb8000 task.ti: ffff899587bb8000 [495593.867652] RIP: 0010:[] [495593.867659] [] native_queued_spin_lock_slowpath+0x122/0x200 [495593.867660] RSP: 0018:ffff899587bbbc38 EFLAGS: 00000246 [495593.867661] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000001710000 [495593.867661] RDX: ffff8967fee9b780 RSI: 0000000000410001 RDI: ffff8987982e40dc [495593.867662] RBP: ffff899587bbbc38 R08: ffff8987ff8db780 R09: 0000000000000000 [495593.867663] R10: ffff89593fc07600 R11: 0000000000000000 R12: ffff899587bbbbe0 [495593.867664] R13: ffff8986346fafc0 R14: ffff899587bbbba0 R15: ffffffffc0c6a378 [495593.867665] FS: 00007f7f46277740(0000) GS:ffff8987ff8c0000(0000) knlGS:0000000000000000 [495593.867666] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495593.867667] CR2: 00007f7f45d611e5 CR3: 0000003dc7410000 CR4: 00000000003407e0 [495593.867668] Call Trace: [495593.867674] [] queued_spin_lock_slowpath+0xb/0xf [495593.867678] [] _raw_spin_lock+0x20/0x30 [495593.867720] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495593.867755] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [495593.867792] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495593.867828] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495593.867863] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495593.867904] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495593.867943] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495593.867946] [] ? wake_up_state+0x20/0x20 [495593.867984] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495593.868022] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495593.868025] [] kthread+0xd1/0xe0 [495593.868027] [] ? insert_kthread_work+0x40/0x40 [495593.868029] [] ret_from_fork_nospec_begin+0xe/0x21 [495593.868031] [] ? insert_kthread_work+0x40/0x40 [495593.868032] Code: [495593.868033] 13 [495593.868033] 48 [495593.868033] c1 [495593.868034] ea [495593.868034] 0d [495593.868034] 48 [495593.868035] 98 [495593.868035] 83 [495593.868035] e2 [495593.868036] 30 [495593.868036] 48 [495593.868036] 81 [495593.868037] c2 [495593.868037] 80 [495593.868037] b7 [495593.868038] 01 [495593.868038] 00 [495593.868038] 48 [495593.868039] 03 [495593.868039] 14 [495593.868039] c5 [495593.868040] 60 [495593.868040] b9 [495593.868040] 34 [495593.868041] 9e [495593.868041] 4c [495593.868041] 89 [495593.868042] 02 [495593.868042] 41 [495593.868043] 8b [495593.868043] 40 [495593.868043] 08 [495593.868044] 85 [495593.868044] c0 [495593.868044] 75 [495593.868045] 0f [495593.868045] 0f [495593.868045] 1f [495593.868046] 44 [495593.868046] 00 [495593.868046] 00 [495593.868047] f3 [495593.868047] 90 [495593.868047] <41> [495593.868048] 8b [495593.868048] 40 [495593.868048] 08 [495593.868049] 85 [495593.868049] c0 [495593.868050] 74 [495593.868050] f6 [495593.868050] 4d [495593.868051] 8b [495593.868051] 08 [495593.868051] 4d [495593.868052] 85 [495593.868052] c9 [495593.868052] 74 [495593.868053] 04 [495593.868053] 41 [495593.868053] 0f [495593.868054] 18 [495593.868054] 09 [495593.868054] 8b [495593.868055] [495594.070735] LustreError: 12375:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c000339d:0xa72a:0x0] [495594.070868] LustreError: 12375:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.106.5@o2ib4: deadline 31:11s ago req@ffff89945b711050 x1624748329159312/t0(0) o4->d9657556-3698-de72-acc2-cb9f2581779e@10.9.106.5@o2ib4:136/0 lens 488/0 e 0 to 0 dl 1550012871 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [495594.252955] ptp [495594.254912] drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495594.264034] CPU: 34 PID: 54102 Comm: ldlm_cn02_006 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495594.276712] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495594.284365] task: ffff89581aafa080 ti: ffff8997e6e9c000 task.ti: ffff8997e6e9c000 [495594.291931] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [495594.301951] RSP: 0018:ffff8997e6e9fbb8 EFLAGS: 00000246 [495594.307349] RAX: 0000000000000000 RBX: ffff8997e6e9fbf0 RCX: 0000000001110000 [495594.314568] RDX: ffff8987ff6db780 RSI: 0000000000710001 RDI: ffff8987982e40dc [495594.321786] RBP: ffff8997e6e9fbb8 R08: ffff8987ff81b780 R09: 0000000000000000 [495594.329007] R10: ffff89593fc07600 R11: 0000000000000000 R12: 0000000000000000 [495594.336228] R13: 0000000000000000 R14: 0000000000000002 R15: ffff89772e38ca40 [495594.343447] FS: 00007fb5666a8700(0000) GS:ffff8987ff800000(0000) knlGS:0000000000000000 [495594.351620] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495594.357451] CR2: 00007fb56a20c000 CR3: 0000003dc7410000 CR4: 00000000003407e0 [495594.364670] Call Trace: [495594.367216] [] queued_spin_lock_slowpath+0xb/0xf [495594.373576] [] _raw_spin_lock+0x20/0x30 [495594.379189] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495594.385834] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [495594.392797] [] ? native_queued_spin_lock_slowpath+0x126/0x200 [495594.400307] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [495594.406958] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495594.413952] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495594.420771] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495594.427774] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495594.435553] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495594.442425] [] ? wake_up_state+0x20/0x20 [495594.448120] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495594.454508] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495594.461984] [] kthread+0xd1/0xe0 [495594.466950] [] ? insert_kthread_work+0x40/0x40 [495594.473131] [] ret_from_fork_nospec_begin+0xe/0x21 [495594.479656] [] ? insert_kthread_work+0x40/0x40 [495594.485835] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [495599.252062] Lustre: mgs: This server is not able to keep up with request traffic (cpu-bound). [495599.260681] Lustre: Skipped 2 previous similar messages [495599.266004] Lustre: 8645:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=50 reqQ=0 recA=23, svcEst=34, delay=6369 [495599.276521] Lustre: 8645:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 2 previous similar messages [495599.287546] Lustre: 8645:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8995f0793450 x1624699771026480/t0(0) o400->4e1094bc-3721-bb8b-579a-22f7fbee507d@10.9.105.15@o2ib4:150/0 lens 224/0 e 0 to 0 dl 1550012885 ref 2 fl New:/0/ffffffff rc 0/-1 [495599.318520] Lustre: 8645:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 110 previous similar messages [495599.636091] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue_nocred, 2 seconds [495599.646786] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.201@o2ib7 (11): c: 0, oc: 8, rc: 8 [495599.658987] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 52 seconds [495599.669327] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 7 previous similar messages [495599.678754] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff89731b6ea200 [495600.336791] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [495600.353124] LustreError: Skipped 8 previous similar messages [495601.692771] NMI watchdog: BUG: soft lockup - CPU#26 stuck for 23s! [ldlm_cn02_017:56373] [495601.700947] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [495601.773948] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495601.807369] CPU: 26 PID: 56373 Comm: ldlm_cn02_017 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495601.820045] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495601.827700] task: ffff8996a0202080 ti: ffff899730f50000 task.ti: ffff899730f50000 [495601.835266] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [495601.845284] RSP: 0018:ffff899730f53c38 EFLAGS: 00000246 [495601.850684] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000000d10000 [495601.857904] RDX: ffff8987ff81b780 RSI: 0000000001110001 RDI: ffff8987982e40dc [495601.865123] RBP: ffff899730f53c38 R08: ffff8987ff79b780 R09: 0000000000000000 [495601.872343] R10: ffff89593fc07600 R11: ffff899730f538ee R12: ffff899730f53be0 [495601.879562] R13: ffff8986346fafc0 R14: ffff899730f53ba0 R15: ffffffffc0c6a378 [495601.886782] FS: 00007f37fbb43700(0000) GS:ffff8987ff780000(0000) knlGS:0000000000000000 [495601.894954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495601.900787] CR2: 00007f3c4aea9000 CR3: 000000203b27e000 CR4: 00000000003407e0 [495601.908007] Call Trace: [495601.910552] [] queued_spin_lock_slowpath+0xb/0xf [495601.916911] [] _raw_spin_lock+0x20/0x30 [495601.922520] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495601.929168] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [495601.935819] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495601.942813] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495601.949632] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495601.956633] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495601.964403] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495601.971277] [] ? default_wake_function+0x12/0x20 [495601.977629] [] ? __wake_up_common+0x5b/0x90 [495601.983584] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495601.989971] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495601.997451] [] kthread+0xd1/0xe0 [495602.002416] [] ? insert_kthread_work+0x40/0x40 [495602.008597] [] ret_from_fork_nospec_begin+0xe/0x21 [495602.015122] [] ? insert_kthread_work+0x40/0x40 [495602.021301] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [495604.167365] LustreError: 14180:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff896c43326c50 x1624736729643232/t0(0) o4->840cf4ae-a7c7-1e87-dcf3-94ef507467f5@10.8.26.18@o2ib6:177/0 lens 488/448 e 3 to 0 dl 1550012912 ref 1 fl Interpret:/0/0 rc 0/0 [495604.167370] LustreError: 14091:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff896c43327c50 x1624702312095424/t0(0) o4->c2c0536b-b17c-9118-170c-d010e3c3b183@10.9.107.15@o2ib4:177/0 lens 488/448 e 3 to 0 dl 1550012912 ref 1 fl Interpret:/0/0 rc 0/0 [495604.167375] LustreError: 14091:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) Skipped 5 previous similar messages [495604.168956] LustreError: 13780:0:(ldlm_lib.c:3273:target_bulk_io()) @@@ truncated bulk READ 0(4096) req@ffff8997ab7ca100 x1624699872710656/t0(0) o37->d7f0cefd-f5dc-ae79-9fbe-8c42036c5092@10.9.105.21@o2ib4:164/0 lens 448/440 e 2 to 0 dl 1550012899 ref 1 fl Interpret:/0/0 rc 0/0 [495604.168959] LustreError: 13780:0:(ldlm_lib.c:3273:target_bulk_io()) Skipped 1 previous similar message [495604.792167] LustreError: 14064:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff89599a2a9050 x1624748413423296/t0(0) o4->3d122a91-53f0-f449-1f10-d08490897e63@10.9.106.65@o2ib4:184/0 lens 488/448 e 3 to 0 dl 1550012919 ref 1 fl Interpret:/2/0 rc 0/0 [495604.817169] LustreError: 14064:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) Skipped 3 previous similar messages [495604.915855] INFO: rcu_sched self-detected stall on CPU [495604.915872] INFO: rcu_sched self-detected stall on CPU { 46} (t=60000 jiffies g=117588196 c=117588195 q=356814) [495604.915875] Task dump for CPU 34: [495604.915878] ldlm_cn02_006 R running task 0 54102 2 0x00000088 [495604.915880] Call Trace: [495604.915935] [] ? ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495604.915968] [] ? ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495604.916008] [] ? ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495604.916043] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495604.916048] [] ? wake_up_state+0x20/0x20 [495604.916083] [] ? ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495604.916118] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495604.916121] [] ? kthread+0xd1/0xe0 [495604.916123] [] ? insert_kthread_work+0x40/0x40 [495604.916126] [] ? ret_from_fork_nospec_begin+0xe/0x21 [495604.916128] [] ? insert_kthread_work+0x40/0x40 [495604.916129] Task dump for CPU 46: [495604.916131] ldlm_cn02_007 R running task 0 56230 2 0x00000088 [495604.916131] Call Trace: [495604.916135] [] sched_show_task+0xa8/0x110 [495604.916136] [] dump_cpu_task+0x39/0x70 [495604.916139] [] rcu_dump_cpu_stacks+0x90/0xd0 [495604.916141] [] rcu_check_callbacks+0x442/0x730 [495604.916145] [] ? tick_sched_do_timer+0x50/0x50 [495604.916147] [] update_process_times+0x46/0x80 [495604.916149] [] tick_sched_handle+0x30/0x70 [495604.916150] [] tick_sched_timer+0x39/0x80 [495604.916153] [] __hrtimer_run_queues+0xf3/0x270 [495604.916154] [] hrtimer_interrupt+0xaf/0x1d0 [495604.916158] [] local_apic_timer_interrupt+0x3b/0x60 [495604.916160] [] smp_apic_timer_interrupt+0x43/0x60 [495604.916162] [] apic_timer_interrupt+0x162/0x170 [495604.916165] [] ? native_queued_spin_lock_slowpath+0x122/0x200 [495604.916169] [] queued_spin_lock_slowpath+0xb/0xf [495604.916172] [] _raw_spin_lock+0x20/0x30 [495604.916201] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495604.916232] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [495604.916234] [] ? native_queued_spin_lock_slowpath+0x156/0x200 [495604.916265] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [495604.916298] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495604.916331] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495604.916363] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495604.916398] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495604.916433] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495604.916434] [] ? wake_up_state+0x20/0x20 [495604.916469] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495604.916504] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495604.916505] [] kthread+0xd1/0xe0 [495604.916507] [] ? insert_kthread_work+0x40/0x40 [495604.916508] [] ret_from_fork_nospec_begin+0xe/0x21 [495604.916510] [] ? insert_kthread_work+0x40/0x40 [495605.235866] { 34} (t=60322 jiffies g=117588196 c=117588195 q=356823) [495605.242566] Task dump for CPU 34: [495605.245972] ldlm_cn02_006 R running task 0 54102 2 0x00000088 [495605.253155] Call Trace: [495605.255695] [] sched_show_task+0xa8/0x110 [495605.262083] [] dump_cpu_task+0x39/0x70 [495605.267570] [] rcu_dump_cpu_stacks+0x90/0xd0 [495605.273576] [] rcu_check_callbacks+0x442/0x730 [495605.279756] [] ? tick_sched_do_timer+0x50/0x50 [495605.285935] [] update_process_times+0x46/0x80 [495605.292027] [] tick_sched_handle+0x30/0x70 [495605.297860] [] tick_sched_timer+0x39/0x80 [495605.303608] [] __hrtimer_run_queues+0xf3/0x270 [495605.309786] [] hrtimer_interrupt+0xaf/0x1d0 [495605.315706] [] local_apic_timer_interrupt+0x3b/0x60 [495605.322319] [] smp_apic_timer_interrupt+0x43/0x60 [495605.328759] [] apic_timer_interrupt+0x162/0x170 [495605.335023] [] ? native_queued_spin_lock_slowpath+0x122/0x200 [495605.343137] [] queued_spin_lock_slowpath+0xb/0xf [495605.349490] [] _raw_spin_lock+0x20/0x30 [495605.355107] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495605.361748] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [495605.368711] [] ? native_queued_spin_lock_slowpath+0x126/0x200 [495605.376220] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [495605.382864] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495605.389857] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495605.396678] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495605.403675] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495605.411448] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495605.418322] [] ? wake_up_state+0x20/0x20 [495605.424015] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495605.430402] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495605.437883] [] kthread+0xd1/0xe0 [495605.442849] [] ? insert_kthread_work+0x40/0x40 [495605.449028] [] ret_from_fork_nospec_begin+0xe/0x21 [495605.455554] [] ? insert_kthread_work+0x40/0x40 [495605.461733] Task dump for CPU 46: [495605.465138] ldlm_cn02_007 R running task 0 56230 2 0x00000088 [495605.472324] Call Trace: [495605.474895] [] ? ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495605.481888] [] ? ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495605.489058] [] ? ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495605.497005] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495605.503880] [] ? wake_up_state+0x20/0x20 [495605.509573] [] ? ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495605.516133] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495605.523613] [] ? kthread+0xd1/0xe0 [495605.528755] [] ? insert_kthread_work+0x40/0x40 [495605.534935] [] ? ret_from_fork_nospec_begin+0xe/0x21 [495605.541632] [] ? insert_kthread_work+0x40/0x40 [495606.307384] Lustre: 12117:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 14s req@ffff8983806d8c50 x1624928968823728/t0(0) o41->fir-MDT0002-mdtlov_UUID@0@lo:0/0 lens 224/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [495606.329173] Lustre: 12117:0:(service.c:2011:ptlrpc_server_handle_req_in()) Skipped 48 previous similar messages [495606.684419] LustreError: 14101:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff896181790050 x1624748590660656/t0(0) o4->b3c5bc63-aa36-b5eb-1ad9-6c8f48fdb4c3@10.9.106.68@o2ib4:176/0 lens 488/448 e 3 to 0 dl 1550012911 ref 1 fl Interpret:/0/0 rc 0/0 [495606.709416] LustreError: 14101:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) Skipped 1 previous similar message [495607.916602] LNetError: 53892:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) [495607.929200] LNetError: 53892:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message [495608.164949] LustreError: 14149:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 20+25s req@ffff8997a1fe3c50 x1624701650995472/t0(0) o4->bde31afc-7c33-bc5f-967c-48dc424a9c49@10.9.107.42@o2ib4:136/0 lens 488/448 e 1 to 0 dl 1550012871 ref 1 fl Interpret:/0/0 rc 0/0 [495608.190120] LustreError: 14149:0:(ldlm_lib.c:3248:target_bulk_io()) Skipped 1 previous similar message [495608.958587] Lustre: 53920:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550012889/real 1550012889] req@ffff8965a9225700 x1624928968810480/t0(0) o13->fir-OST002d-osc-MDT0000@10.0.10.108@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1550012896 ref 1 fl Rpc:RX/2/ffffffff rc 0/-1 [495608.986877] Lustre: 53920:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 37 previous similar messages [495608.996800] Lustre: fir-OST002d-osc-MDT0000: Connection to fir-OST002d (at 10.0.10.108@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [495609.012952] Lustre: Skipped 17 previous similar messages [495609.018443] LustreError: 14060:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff896181797450 x1624736214585104/t0(0) o4->efc6b332-a736-88e8-194a-588aa3e05348@10.8.21.36@o2ib6:170/0 lens 488/448 e 1 to 0 dl 1550012905 ref 1 fl Interpret:/0/0 rc 0/0 [495609.043347] LustreError: 14060:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) Skipped 1 previous similar message [495609.356985] LustreError: 11497:0:(ldlm_lib.c:3248:target_bulk_io()) @@@ timeout on bulk WRITE after 22+27s req@ffff8997706a5450 x1624701642316496/t0(0) o4->b7f14aec-d465-a06a-f9b6-045d2e3bc764@10.9.107.39@o2ib4:135/0 lens 488/448 e 1 to 0 dl 1550012870 ref 1 fl Interpret:/0/0 rc 0/0 [495609.717759] LustreError: 53892:0:(events.c:305:request_in_callback()) event type 2, status -5, service mdt_io [495609.734050] LustreError: 14101:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small for magic/version check [495609.745592] LustreError: 14101:0:(sec.c:2068:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.107.38@o2ib4 x1624766012689104 [495614.455764] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 10.0.10.201@o2ib7: 22 seconds [495614.466106] LNet: 53887:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 15 previous similar messages [495614.476333] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff89644d396e00 [495614.487350] LNetError: 53887:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [495614.499945] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8994dba72800 [495614.512471] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff895b56506400 [495614.523353] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8961bb6ffa00 [495614.534257] LNetError: 11926:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.0.65@o2ib6 from [495614.545038] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff896406a98200 [495614.556052] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff895b56505000 [495614.566930] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff895b56501200 [495614.577803] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8961bb6ff200 [495614.588720] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff897f64b0ea00 [495614.588782] LustreError: 14104:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff895dc8e52c00 [495614.588836] LustreError: 14443:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff895dc8e52800 [495614.621344] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8970fc2e0200 [495614.632307] LustreError: 14169:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff897397fed400 [495615.034112] LNetError: 11926:0:(lib-move.c:1980:lnet_handle_find_routed_path()) no route to 10.8.0.65@o2ib6 from [495615.044718] LNetError: 11926:0:(lib-move.c:1980:lnet_handle_find_routed_path()) Skipped 49113 previous similar messages [495615.213239] LustreError: 14483:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff8997c3f3c000 [495615.592188] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff899636f4d600 [495615.605343] LNetError: 53894:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [495615.611774] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff896db1f59a00 [495615.612676] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff895b56502a00 [495615.613354] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff89636ce55a00 [495615.613460] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff897e27b66c00 [495615.613484] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8996f6a03000 [495615.614490] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff89976af48000 [495615.614578] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8993ea2e2200 [495615.614614] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff899638fae600 [495615.614623] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8973f933fc00 [495615.614652] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff8996f6a07000 [495615.614703] LustreError: 53887:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff896a773d8800 [495615.737401] LNetError: 53894:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages [495616.350542] LNet: 53887:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.210@o2ib7: accepting [495616.604157] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [495616.614409] LNetError: 53887:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [495616.624665] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.102@o2ib7 (5): c: 0, oc: 0, rc: 8 [495616.636735] LNetError: 53887:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Skipped 5 previous similar messages [495618.725804] LNet: 53887:0:(o2iblnd_cb.c:1484:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.102@o2ib7: connected [495618.725959] LNetError: 53889:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [495618.726867] Lustre: 14488:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (33:32s); client may timeout. req@ffff896d7aea2850 x1624751951919152/t0(0) o4->037cf541-a575-03b3-3aed-140164784d71@10.9.107.61@o2ib4:139/0 lens 488/0 e 1 to 0 dl 1550012874 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [495618.726869] Lustre: 14488:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 45 previous similar messages [495618.726906] LustreError: 14488:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.4.26@o2ib6: deadline 33:32s ago req@ffff897473e6c850 x1624705353458784/t0(0) o4->0e1cc7ee-ac14-2533-62de-8aa817b3cbc6@10.8.4.26@o2ib6:139/0 lens 488/0 e 1 to 0 dl 1550012874 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [495618.726907] LustreError: 14488:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 1 previous similar message [495619.577435] LustreError: 53898:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff896db1f5be00 [495620.859316] Lustre: 56359:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-75s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8981e82fc800 x1624700769944352/t0(0) o103->1cc72755-0b18-692a-013c-e5abb0ad9b59@10.9.106.44@o2ib4:98/0 lens 328/0 e 0 to 0 dl 1550012833 ref 1 fl Interpret:H/0/ffffffff rc 0/-1 [495620.890984] Lustre: 56359:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages [495620.901964] Lustre: ldlm_canceld: This server is not able to keep up with request traffic (cpu-bound). [495620.911357] Lustre: Skipped 1 previous similar message [495620.916587] Lustre: 56419:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=0 reqQ=0 recA=18, svcEst=111, delay=42 [495620.927023] Lustre: 56419:0:(service.c:1541:ptlrpc_at_check_timed()) Skipped 1 previous similar message [495621.030872] LustreError: 14123:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff896181793c50 x1624673811841200/t0(0) o4->04bbddc7-37a9-9b79-7fa8-c451901e5d15@10.9.101.67@o2ib4:201/0 lens 488/448 e 4 to 0 dl 1550012936 ref 1 fl Interpret:/0/0 rc 0/0 [495621.816280] NMI watchdog: BUG: soft lockup - CPU#40 stuck for 22s! [ldlm_cn00_021:56406] [495621.824457] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [495621.897455] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495621.930878] CPU: 40 PID: 56406 Comm: ldlm_cn00_021 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495621.943563] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495621.951215] task: ffff8997cb27d140 ti: ffff899692324000 task.ti: ffff899692324000 [495621.958782] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [495621.968808] RSP: 0018:ffff899692327c38 EFLAGS: 00000246 [495621.974208] RAX: 0000000000000000 RBX: 00000001c0f61fd2 RCX: 0000000001410000 [495621.981427] RDX: ffff8967ff0db780 RSI: 0000000001610001 RDI: ffff8987982e40dc [495621.988647] RBP: ffff899692327c38 R08: ffff8967ff09b780 R09: 0000000000000000 [495621.995866] R10: ffff89593fc07600 R11: ffffc4517da09800 R12: ffff899692327be0 [495622.003086] R13: ffff8986346fafc0 R14: ffff899692327ba0 R15: ffffffffc0c6a378 [495622.010307] FS: 00007f2533f15880(0000) GS:ffff8967ff080000(0000) knlGS:0000000000000000 [495622.018478] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495622.024311] CR2: 00007f2521d7f99c CR3: 0000003dc7410000 CR4: 00000000003407e0 [495622.031533] Call Trace: [495622.034077] [] queued_spin_lock_slowpath+0xb/0xf [495622.040434] [] _raw_spin_lock+0x20/0x30 [495622.046048] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495622.052698] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [495622.059347] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495622.066342] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495622.073159] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495622.080160] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495622.087938] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495622.094809] [] ? default_wake_function+0x12/0x20 [495622.101163] [] ? __wake_up_common+0x5b/0x90 [495622.107118] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495622.113504] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495622.120985] [] kthread+0xd1/0xe0 [495622.125948] [] ? insert_kthread_work+0x40/0x40 [495622.132131] [] ret_from_fork_nospec_begin+0xe/0x21 [495622.138655] [] ? insert_kthread_work+0x40/0x40 [495622.144833] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [495625.781377] NMI watchdog: BUG: soft lockup - CPU#36 stuck for 23s! [ldlm_cn00_009:56260] [495625.789554] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [495625.862552] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495625.895975] CPU: 36 PID: 56260 Comm: ldlm_cn00_009 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495625.908652] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495625.916307] task: ffff8996c8fd30c0 ti: ffff8997fae34000 task.ti: ffff8997fae34000 [495625.923871] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 [495625.933901] RSP: 0018:ffff8997fae37bb8 EFLAGS: 00000246 [495625.939300] RAX: 0000000000000000 RBX: ffff8997fae37bf0 RCX: 0000000001210000 [495625.946517] RDX: ffff8967fef5b780 RSI: 0000000000a10001 RDI: ffff8987982e40dc [495625.953738] RBP: ffff8997fae37bb8 R08: ffff8967ff05b780 R09: 0000000000000000 [495625.960957] R10: ffff89593fc07600 R11: ffffc4517da09800 R12: 0000000000000000 [495625.968178] R13: 0000000000000000 R14: 0000000000000002 R15: ffff8961737d7bc0 [495625.975398] FS: 00007fb564ea5700(0000) GS:ffff8967ff040000(0000) knlGS:0000000000000000 [495625.983569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495625.989401] CR2: 00007f85a1ee680d CR3: 0000003dc7410000 CR4: 00000000003407e0 [495625.996623] Call Trace: [495625.999167] [] queued_spin_lock_slowpath+0xb/0xf [495626.005526] [] _raw_spin_lock+0x20/0x30 [495626.011136] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495626.017785] [] ldlm_cancel_callback+0x92/0x330 [ptlrpc] [495626.024747] [] ? native_queued_spin_lock_slowpath+0x126/0x200 [495626.032256] [] ldlm_lock_cancel+0x56/0x1f0 [ptlrpc] [495626.038901] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495626.045894] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495626.052713] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495626.059713] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495626.067494] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495626.074402] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495626.080789] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495626.088268] [] kthread+0xd1/0xe0 [495626.093233] [] ? insert_kthread_work+0x40/0x40 [495626.099413] [] ret_from_fork_nospec_begin+0xe/0x21 [495626.105938] [] ? insert_kthread_work+0x40/0x40 [495626.112115] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 34 9e 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b [495627.751440] LustreError: 14515:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff896facaabc50 x1624748147491376/t0(0) o4->9ac05ed6-3537-7c2f-d62f-875a75698c71@10.9.107.35@o2ib4:219/0 lens 488/448 e 3 to 0 dl 1550012954 ref 1 fl Interpret:/2/0 rc 0/0 [495627.775660] LustreError: 14515:0:(ldlm_lib.c:3258:target_bulk_io()) Skipped 20 previous similar messages [495627.785242] Lustre: fir-MDT0002: Bulk IO write error with 9ac05ed6-3537-7c2f-d62f-875a75698c71 (at 10.9.107.35@o2ib4), client will retry: rc = -110 [495627.798531] Lustre: Skipped 28 previous similar messages [495629.169564] LustreError: 14089:0:(ldlm_lib.c:3273:target_bulk_io()) @@@ truncated bulk READ 0(114688) req@ffff896c43325050 x1624700725865280/t0(0) o3->c50e9e63-bc69-ffb4-d9c5-0a1d77a8b849@10.9.106.60@o2ib4:201/0 lens 488/440 e 4 to 0 dl 1550012936 ref 1 fl Interpret:/0/0 rc 0/0 [495629.169608] LustreError: 14175:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff896d8aec6050 x1624740399756704/t0(0) o4->9b6ee593-357f-08be-650d-81734979fa6c@10.9.107.40@o2ib4:210/0 lens 488/448 e 4 to 0 dl 1550012945 ref 1 fl Interpret:/0/0 rc 0/0 [495629.219287] LustreError: 14089:0:(ldlm_lib.c:3273:target_bulk_io()) Skipped 3 previous similar messages [495629.228788] Lustre: fir-MDT0002: Bulk IO read error with c50e9e63-bc69-ffb4-d9c5-0a1d77a8b849 (at 10.9.106.60@o2ib4), client will retry: rc -110 [495630.456929] LustreError: 14564:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.103.26@o2ib4: deadline 6:43s ago req@ffff896db1f46850 x1624697931790752/t0(0) o4->676b9462-d0c7-96e9-ddb9-5790c315c2e9@10.9.103.26@o2ib4:140/0 lens 9320/0 e 0 to 0 dl 1550012875 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [495630.488951] LustreError: 14564:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 73 previous similar messages [495630.557940] LustreError: 14565:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5c5b59418f930 vs. last_xid 5c5b59418f93f req@ffff896db1f43050 x1624758547970352/t0(0) o4->0343f8c1-f803-943e-238c-e83a0eb1a3ba@10.9.106.34@o2ib4:232/0 lens 1856/0 e 0 to 0 dl 1550012967 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [495630.589760] LustreError: 12241:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff89599a2bb050 x1624753567489248/t0(0) o4->2105faed-3f3f-f302-d9e7-f8bce33a4b72@10.8.3.23@o2ib6:209/0 lens 504/448 e 4 to 0 dl 1550012944 ref 1 fl Interpret:/0/0 rc 0/0 [495633.624573] NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [ldlm_cn02_015:56367] [495633.632743] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [495633.705744] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495633.739209] CPU: 18 PID: 56367 Comm: ldlm_cn02_015 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495633.751896] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495633.759550] task: ffff89978ff630c0 ti: ffff8996882f0000 task.ti: ffff8996882f0000 [495633.767115] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x156/0x200 [495633.777135] RSP: 0018:ffff8996882f3c38 EFLAGS: 00000202 [495633.782533] RAX: 0000000000000001 RBX: 00000001c0f61fd2 RCX: 0000000000910000 [495633.789753] RDX: 0000000000210001 RSI: 0000000000610001 RDI: ffff8987982e40dc [495633.796972] RBP: ffff8996882f3c38 R08: ffff8987ff71b780 R09: ffff8967fef5b780 [495633.804191] R10: ffff89593fc07600 R11: ffffc451da189800 R12: ffff8996882f3be0 [495633.811413] R13: ffff8986346fafc0 R14: ffff8996882f3ba0 R15: ffffffffc0c6a378 [495633.818634] FS: 00007f37fbb43700(0000) GS:ffff8987ff700000(0000) knlGS:0000000000000000 [495633.826803] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495633.832637] CR2: 00007ff4c078e000 CR3: 000000203b27e000 CR4: 00000000003407e0 [495633.839856] Call Trace: [495633.842401] [] queued_spin_lock_slowpath+0xb/0xf [495633.848769] [] _raw_spin_lock+0x20/0x30 [495633.854384] [] lock_res_and_lock+0x2c/0x50 [ptlrpc] [495633.861023] [] ldlm_lock_cancel+0x2d/0x1f0 [ptlrpc] [495633.867664] [] ldlm_request_cancel+0x19b/0x740 [ptlrpc] [495633.874650] [] ldlm_handle_cancel+0xba/0x250 [ptlrpc] [495633.881460] [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] [495633.888453] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495633.896225] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [495633.903101] [] ? wake_up_state+0x20/0x20 [495633.908792] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495633.915178] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [495633.922662] [] kthread+0xd1/0xe0 [495633.927627] [] ? insert_kthread_work+0x40/0x40 [495633.933808] [] ret_from_fork_nospec_begin+0xe/0x21 [495633.940332] [] ? insert_kthread_work+0x40/0x40 [495633.946511] Code: 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 85 c0 74 21 83 f8 03 75 10 eb 1a 66 2e 0f 1f 84 00 00 00 00 00 85 c0 74 0c f3 90 <8b> 17 0f b7 c2 83 f8 03 75 f0 be 01 00 00 00 eb 15 66 0f 1f 84 [495654.172120] LustreError: 14439:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff8961f463c850 x1624737499098400/t0(0) o4->7d78b5a7-dae3-eca3-5a98-d1b9fe987149@10.8.17.22@o2ib6:213/0 lens 488/448 e 3 to 0 dl 1550012948 ref 1 fl Interpret:/0/0 rc 0/0 [495654.172131] LustreError: 14122:0:(ldlm_lib.c:3264:target_bulk_io()) @@@ network error on bulk WRITE req@ffff8961f463f850 x1624761337641680/t0(0) o4->83e72c3d-872c-7a5f-f5c1-edf566d41d60@10.9.107.1@o2ib4:214/0 lens 488/448 e 2 to 0 dl 1550012949 ref 1 fl Interpret:/0/0 rc 0/0 [495654.221529] LustreError: 14439:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) Skipped 28 previous similar messages [495679.987247] Lustre: 53941:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550012954/real 1550012956] req@ffff897ed02f2a00 x1624928977305264/t0(0) o13->fir-OST0019-osc-MDT0000@10.0.10.106@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1550012967 ref 1 fl Rpc:RX/0/ffffffff rc 0/-1 [495680.015542] Lustre: 53941:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 112183 previous similar messages [495680.025814] Lustre: fir-OST0019-osc-MDT0000: Connection to fir-OST0019 (at 10.0.10.106@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [495680.041961] Lustre: Skipped 19 previous similar messages [495680.811288] LustreError: 11497:0:(sec.c:2362:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffff89974d6a3450 x1624732667694928/t0(0) o4->fced5f19-499d-5f3d-efe5-faf9d4f8cdcd@10.9.107.50@o2ib4:239/0 lens 488/448 e 3 to 0 dl 1550012974 ref 1 fl Interpret:/2/0 rc 0/0 [495691.503515] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 128s: evicting client at 10.9.106.44@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff89842a230480/0x1a4b7ac7dd43f9e0 lrc: 3/0,0 mode: PR/PR res: [0x2c000168c:0x1a58:0x0].0x0 bits 0x40/0x0 rrc: 448270 type: IBT flags: 0x60000400010020 nid: 10.9.106.44@o2ib4 remote: 0xeee213bfc8a25ea0 expref: 458108 pid: 12310 timeout: 495649 lvb_type: 0 [495691.542296] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 277931 previous similar messages [495691.671781] LustreError: 56364:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.106.44@o2ib4 arrived at 1550012979 with bad export cookie 1894743065284034757 [495691.687420] LustreError: 56364:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 15 previous similar messages [495692.115090] Lustre: 12554:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 12s req@ffff8964913c8c00 x1623971449011360/t0(0) o35->0ccdc4e2-9749-c9a5-afb4-85874ce74d6c@10.0.10.3@o2ib7:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [495692.139466] Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound). [495692.148856] Lustre: 14597:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=9, svcEst=60, delay=0 [495692.159041] Lustre: 14597:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-2s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff895e95bc0f00 x1624672830833424/t0(0) o35->ac16def5-1a59-80e5-2e16-45b58fcd0330@10.8.2.8@o2ib6:243/0 lens 392/0 e 0 to 0 dl 1550012978 ref 2 fl New:/0/ffffffff rc 0/-1 [495692.189783] Lustre: 14597:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 116 previous similar messages [495692.251325] LustreError: 14598:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5c5a92523b2f0 vs. last_xid 5c5a92523b2ff req@ffff895afca72c50 x1624705146794736/t0(0) o4->44c34e5e-d358-e5f1-f032-e5118620e81b@10.8.24.9@o2ib6:256/0 lens 1648/0 e 0 to 0 dl 1550012991 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [495693.252489] LustreError: 14606:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.2.8@o2ib6: deadline 6:3s ago req@ffff895e95bc0f00 x1624672830833424/t0(0) o35->ac16def5-1a59-80e5-2e16-45b58fcd0330@10.8.2.8@o2ib6:243/0 lens 392/0 e 0 to 0 dl 1550012978 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [495693.284066] Lustre: 14606:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:3s); client may timeout. req@ffff895e95bc0f00 x1624672830833424/t0(0) o35->ac16def5-1a59-80e5-2e16-45b58fcd0330@10.8.2.8@o2ib6:243/0 lens 392/0 e 0 to 0 dl 1550012978 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [495693.312714] Lustre: 14606:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 127 previous similar messages [495693.367695] LustreError: 14118:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5c5b0ad7cac70 vs. last_xid 5c5b0ad7cac9f req@ffff89867eb70050 x1624737499098224/t0(0) o4->7d78b5a7-dae3-eca3-5a98-d1b9fe987149@10.8.17.22@o2ib6:256/0 lens 9320/0 e 0 to 0 dl 1550012991 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [495693.752932] LNetError: 53898:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) [495693.765536] LNetError: 53898:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages [495695.672201] LustreError: 54045:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.106.44@o2ib4 arrived at 1550012983 with bad export cookie 1894743065284034757 [495695.687841] LustreError: 54045:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 156502 previous similar messages [495717.835689] NMI watchdog: BUG: soft lockup - CPU#42 stuck for 23s! [ldlm_bl_88:11064] [495717.843744] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [495717.916954] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495717.950376] CPU: 42 PID: 11064 Comm: ldlm_bl_88 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495717.962795] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495717.970447] task: ffff899757b41040 ti: ffff8996e9b90000 task.ti: ffff8996e9b90000 [495717.978012] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [495717.988443] RSP: 0018:ffff8996e9b93bb0 EFLAGS: 00000282 [495717.993964] RAX: 0000000000000001 RBX: ffff8980566b5a60 RCX: ffff8980566b5a60 [495718.001270] RDX: ffff8996e9b93ca8 RSI: ffff8977dd5d4a40 RDI: ffff897e7f38d340 [495718.008635] RBP: ffff8996e9b93c08 R08: ffff8996e9b93ca8 R09: 00000000c00030e4 [495718.015869] R10: 0000000000000064 R11: ffff8977dd5d4a40 R12: ffff8996e9b93ca8 [495718.023090] R13: 00000000c00030e4 R14: 0000000000000064 R15: ffff8977dd5d4a40 [495718.030309] FS: 00007f37fbb43700(0000) GS:ffff8987ff880000(0000) knlGS:0000000000000000 [495718.038480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495718.044316] CR2: 00007fb56a20c000 CR3: 000000203b27e000 CR4: 00000000003407e0 [495718.051534] Call Trace: [495718.054122] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [495718.061726] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [495718.069585] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [495718.076664] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [495718.083744] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [495718.092123] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [495718.100238] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [495718.107208] [] ? wake_up_state+0x20/0x20 [495718.112904] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [495718.120387] [] kthread+0xd1/0xe0 [495718.125350] [] ? insert_kthread_work+0x40/0x40 [495718.131533] [] ret_from_fork_nospec_begin+0xe/0x21 [495718.138056] [] ? insert_kthread_work+0x40/0x40 [495718.144234] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [495721.278805] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 109s: evicting client at 10.9.106.44@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff897e28a05100/0x1a4b7ac7dd43f675 lrc: 3/0,0 mode: PR/PR res: [0x2c000168c:0x1a58:0x0].0x0 bits 0x40/0x0 rrc: 447891 type: IBT flags: 0x60000400010020 nid: 10.9.106.44@o2ib4 remote: 0xeee213bfc8a25e68 expref: 447974 pid: 12289 timeout: 495708 lvb_type: 0 [495721.317600] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 89 previous similar messages [495739.279246] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 120s: evicting client at 10.9.106.44@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8964abf6de80/0x1a4b7ac7dd43f44c lrc: 3/0,0 mode: PR/PR res: [0x2c000168c:0x1a58:0x0].0x0 bits 0x40/0x0 rrc: 447677 type: IBT flags: 0x60000400010020 nid: 10.9.106.44@o2ib4 remote: 0xeee213bfc8a25e45 expref: 447760 pid: 12264 timeout: 495726 lvb_type: 0 [495739.318029] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages [495739.326235] LNet: Service thread pid 12254 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [495739.326237] Pid: 12254, comm: mdt01_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [495739.326237] Call Trace: [495739.326281] [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [495739.326309] [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [495739.326330] [] mdt_dom_discard_data+0x101/0x130 [mdt] [495739.326340] [] mdt_reint_unlink+0x331/0x14a0 [mdt] [495739.326351] [] mdt_reint_rec+0x83/0x210 [mdt] [495739.326361] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [495739.326371] [] mdt_reint+0x67/0x140 [mdt] [495739.326414] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [495739.326447] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [495739.326479] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [495739.326483] [] kthread+0xd1/0xe0 [495739.326487] [] ret_from_fork_nospec_begin+0xe/0x21 [495739.326513] [] 0xffffffffffffffff [495739.326515] LustreError: dumping log to /tmp/lustre-log.1550013027.12254 [495745.836391] NMI watchdog: BUG: soft lockup - CPU#42 stuck for 23s! [ldlm_bl_88:11064] [495745.844306] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [495745.917307] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495745.950740] CPU: 42 PID: 11064 Comm: ldlm_bl_88 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495745.963174] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495745.970827] task: ffff899757b41040 ti: ffff8996e9b90000 task.ti: ffff8996e9b90000 [495745.978392] RIP: 0010:[] [] ldlm_add_ast_work_item+0x4e/0x3d0 [ptlrpc] [495745.988282] RSP: 0018:ffff8996e9b93b80 EFLAGS: 00000282 [495745.993680] RAX: 0000000000000001 RBX: ffff8987ff89f1d0 RCX: ffff8980566b5a60 [495746.000900] RDX: ffff8996e9b93ca8 RSI: ffff8977dd5d4a40 RDI: ffff89941ae54380 [495746.008120] RBP: ffff8996e9b93ba0 R08: ffff8996e9b93ca8 R09: 00000000c00030e4 [495746.015340] R10: 0000000000000064 R11: ffff8977dd5d4a40 R12: ffff8996e9b93b98 [495746.022558] R13: ffffc4523e4b9c00 R14: ffffffff9d994d0d R15: ffff8996e9b93af0 [495746.029780] FS: 00007f37fbb43700(0000) GS:ffff8987ff880000(0000) knlGS:0000000000000000 [495746.037951] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495746.043785] CR2: 00007fb56a20c000 CR3: 000000203b27e000 CR4: 00000000003407e0 [495746.051005] Call Trace: [495746.053598] [] ldlm_inodebits_compat_queue+0x198/0x440 [ptlrpc] [495746.061286] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [495746.068883] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [495746.076743] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [495746.083822] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [495746.090904] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [495746.099291] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [495746.107410] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [495746.114372] [] ? wake_up_state+0x20/0x20 [495746.120071] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [495746.127552] [] kthread+0xd1/0xe0 [495746.132520] [] ? insert_kthread_work+0x40/0x40 [495746.138707] [] ret_from_fork_nospec_begin+0xe/0x21 [495746.145232] [] ? insert_kthread_work+0x40/0x40 [495746.151409] Code: 05 45 bd d4 ff 01 0f 85 c1 00 00 00 48 8b 43 48 8b 40 1c 85 c0 0f 84 47 02 00 00 4d 85 e4 0f 84 91 01 00 00 4c 8b ab 00 01 00 00 <41> f6 c5 20 75 7f 48 85 db 0f 84 2d 02 00 00 f6 05 06 bd d4 ff [495752.497557] INFO: rcu_sched self-detected stall on CPU [495752.498559] INFO: rcu_sched detected stalls on CPUs/tasks: [495752.498559] { [495752.498561] 42 [495752.498562] } [495752.498572] (detected by 0, t=60002 jiffies, g=117588231, c=117588230, q=2142210) [495752.498574] Task dump for CPU 42: [495752.498575] ldlm_bl_88 R [495752.498576] running task [495752.498577] 0 11064 2 0x00000088 [495752.498579] Call Trace: [495752.498624] [] ? ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [495752.498655] [] ? ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [495752.498661] [] ? wake_up_state+0x20/0x20 [495752.498691] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [495752.498694] [] ? kthread+0xd1/0xe0 [495752.498696] [] ? insert_kthread_work+0x40/0x40 [495752.498700] [] ? ret_from_fork_nospec_begin+0xe/0x21 [495752.498701] [] ? insert_kthread_work+0x40/0x40 [495752.580745] { [495752.582531] 42} (t=60086 jiffies g=117588231 c=117588230 q=2144870) [495752.587732] Task dump for CPU 42: [495752.591136] ldlm_bl_88 R running task 0 11064 2 0x00000088 [495752.598323] Call Trace: [495752.600862] [] sched_show_task+0xa8/0x110 [495752.607252] [] dump_cpu_task+0x39/0x70 [495752.612739] [] rcu_dump_cpu_stacks+0x90/0xd0 [495752.618741] [] rcu_check_callbacks+0x442/0x730 [495752.624924] [] ? tick_sched_do_timer+0x50/0x50 [495752.631102] [] update_process_times+0x46/0x80 [495752.637194] [] tick_sched_handle+0x30/0x70 [495752.643029] [] tick_sched_timer+0x39/0x80 [495752.648776] [] __hrtimer_run_queues+0xf3/0x270 [495752.654960] [] hrtimer_interrupt+0xaf/0x1d0 [495752.660884] [] local_apic_timer_interrupt+0x3b/0x60 [495752.667505] [] smp_apic_timer_interrupt+0x43/0x60 [495752.673953] [] apic_timer_interrupt+0x162/0x170 [495752.680215] [] ? ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [495752.688751] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [495752.696359] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [495752.704221] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [495752.711298] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [495752.718372] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [495752.726755] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [495752.734878] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [495752.741842] [] ? wake_up_state+0x20/0x20 [495752.747537] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [495752.755022] [] kthread+0xd1/0xe0 [495752.759988] [] ? insert_kthread_work+0x40/0x40 [495752.766167] [] ret_from_fork_nospec_begin+0xe/0x21 [495752.772694] [] ? insert_kthread_work+0x40/0x40 [495771.280046] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 139s: evicting client at 10.9.106.44@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8961dd2f0fc0/0x1a4b7ac7dd43ec65 lrc: 3/0,0 mode: PR/PR res: [0x2c000168c:0x1a58:0x0].0x0 bits 0x40/0x0 rrc: 447305 type: IBT flags: 0x60000400010020 nid: 10.9.106.44@o2ib4 remote: 0xeee213bfc8a25dd5 expref: 447388 pid: 12339 timeout: 495758 lvb_type: 0 [495771.318825] LustreError: 54055:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 15 previous similar messages [495777.837196] NMI watchdog: BUG: soft lockup - CPU#42 stuck for 23s! [ldlm_bl_88:11064] [495777.845112] Modules linked in: osp(OE) mdd(OE) mdt(OE) lustre(OE) mdc(OE) lod(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses dm_multipath ipmi_si enclosure pcspkr dm_mod sg ipmi_devintf ccp i2c_piix4 ipmi_msghandler k10temp acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif [495777.918114] crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper ahci syscopyarea mlx5_core(OE) sysfillrect sysimgblt libahci mlxfw(OE) fb_sys_fops devlink ttm crct10dif_pclmul tg3 crct10dif_common mlx_compat(OE) drm megaraid_sas crc32c_intel libata ptp drm_panel_orientation_quirks pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: osp] [495777.951534] CPU: 42 PID: 11064 Comm: ldlm_bl_88 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [495777.963953] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [495777.971608] task: ffff899757b41040 ti: ffff8996e9b90000 task.ti: ffff8996e9b90000 [495777.979172] RIP: 0010:[] [] ldlm_inodebits_compat_queue+0x188/0x440 [ptlrpc] [495777.989580] RSP: 0018:ffff8996e9b93bb0 EFLAGS: 00000282 [495777.994981] RAX: 0000000000000001 RBX: ffff8980566b5a60 RCX: ffff8980566b5a60 [495778.002199] RDX: ffff8996e9b93ca8 RSI: ffff8977dd5d4a40 RDI: ffff8971f7a318c0 [495778.009421] RBP: ffff8996e9b93c08 R08: ffff8996e9b93ca8 R09: 00000000c00030e4 [495778.016638] R10: 0000000000000064 R11: ffff8977dd5d4a40 R12: ffff8996e9b93ca8 [495778.023857] R13: 00000000c00030e4 R14: 0000000000000064 R15: ffff8977dd5d4a40 [495778.031079] FS: 00007f37fbb43700(0000) GS:ffff8987ff880000(0000) knlGS:0000000000000000 [495778.039251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [495778.045091] CR2: 00007fb56a20c000 CR3: 000000203b27e000 CR4: 00000000003407e0 [495778.052312] Call Trace: [495778.054900] [] ldlm_process_inodebits_lock+0x93/0x3d0 [ptlrpc] [495778.062512] [] ? ldlm_inodebits_compat_queue+0x440/0x440 [ptlrpc] [495778.070369] [] ldlm_reprocess_queue+0x1be/0x3f0 [ptlrpc] [495778.077452] [] __ldlm_reprocess_all+0x10b/0x380 [ptlrpc] [495778.084530] [] ldlm_cancel_lock_for_export.isra.26+0x1c2/0x390 [ptlrpc] [495778.092915] [] ldlm_export_cancel_blocked_locks+0x121/0x200 [ptlrpc] [495778.101039] [] ldlm_bl_thread_main+0x112/0x700 [ptlrpc] [495778.108002] [] ? wake_up_state+0x20/0x20 [495778.113701] [] ? ldlm_handle_bl_callback+0x530/0x530 [ptlrpc] [495778.121182] [] kthread+0xd1/0xe0 [495778.126147] [] ? insert_kthread_work+0x40/0x40 [495778.132329] [] ret_from_fork_nospec_begin+0xe/0x21 [495778.138861] [] ? insert_kthread_work+0x40/0x40 [495778.145039] Code: 74 0e 4c 89 f2 48 89 de 4c 89 ff e8 d3 a2 fc ff 49 8b 87 10 02 00 00 4c 8d a8 f0 fd ff ff 4d 39 fd 74 2b 49 83 bd a8 00 00 00 00 <74> 0e 4c 89 f2 48 89 de 4c 89 ef e8 a8 a2 fc ff 4d 8b ad 10 02 [495957.289712] LustreError: 12254:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1550012945, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8977dd5d4a40/0x1a4b7ac7ddcc3251 lrc: 3/0,1 mode: --/PW res: [0x2c000168c:0x1a58:0x0].0x0 bits 0x40/0x0 rrc: 445105 type: IBT flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 12254 timeout: 0 lvb_type: 0 [496225.418567] Lustre: fir-MDT0000: Client b5ad8834-3caa-3b3a-aeae-c877aabb1ef0 (at 10.8.2.6@o2ib6) reconnecting [496225.428576] Lustre: Skipped 3541 previous similar messages [496225.434192] Lustre: fir-MDT0000: Connection restored to a61413e9-0563-2e7a-41f8-ffa5690ecb59 (at 10.8.2.6@o2ib6) [496225.444474] Lustre: Skipped 5313 previous similar messages [496241.896501] Lustre: MGS: Received new LWP connection from 10.8.2.4@o2ib6, removing former export from same NID [496241.906613] Lustre: Skipped 1697 previous similar messages [496397.958733] Lustre: 12301:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 [496480.568013] Lustre: 12373:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 [496514.474591] Lustre: 11701:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 [496514.486348] Lustre: 11701:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message [496827.967570] Lustre: fir-MDT0002: haven't heard from client 68217227-70ae-290e-e4df-14b7516db509 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c99f20400, cur 1550014116 expire 1550013966 last 1550013889 [496827.989295] Lustre: Skipped 2 previous similar messages [496864.578845] Lustre: fir-MDT0000: Client a2c53951-d174-3b52-a3ee-252826248ac1 (at 10.9.112.6@o2ib4) reconnecting [496864.589029] Lustre: Skipped 3 previous similar messages [496864.594383] Lustre: fir-MDT0000: Connection restored to (at 10.9.112.6@o2ib4) [496864.601710] Lustre: Skipped 5 previous similar messages [497464.964847] Lustre: fir-MDT0000: Connection restored to ebca69ce-60cf-b682-0b00-8cb081d19aed (at 10.8.3.11@o2ib6) [497464.975195] Lustre: Skipped 2 previous similar messages [497493.075367] Lustre: fir-MDT0000: Client 691c85d2-0e39-9e6d-1bfd-ecbaccae5366 (at 10.8.2.27@o2ib6) reconnecting [497594.956880] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) ASSERTION( o->opo_reserved == 0 ) failed: [497594.967490] LustreError: 12324:0:(osp_object.c:1458:osp_declare_create()) LBUG [497594.974807] Pid: 12324, comm: mdt01_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 [497594.984636] Call Trace: [497594.987187] [] libcfs_call_trace+0x8c/0xc0 [libcfs] [497594.993859] [] lbug_with_loc+0x4c/0xa0 [libcfs] [497595.000177] [] osp_declare_create+0x5a5/0x5b0 [osp] [497595.006833] [] lod_sub_declare_create+0xdf/0x210 [lod] [497595.013748] [] lod_qos_prep_create+0x15d4/0x1890 [lod] [497595.020662] [] lod_declare_instantiate_components+0x9a/0x1d0 [lod] [497595.028614] [] lod_declare_layout_change+0xb65/0x10f0 [lod] [497595.035988] [] mdd_declare_layout_change+0x62/0x120 [mdd] [497595.043172] [] mdd_layout_change+0x882/0x1000 [mdd] [497595.049830] [] mdt_layout_change+0x337/0x430 [mdt] [497595.056398] [] mdt_intent_layout+0x7ee/0xcc0 [mdt] [497595.062968] [] mdt_intent_policy+0x2e8/0xd00 [mdt] [497595.069549] [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [497595.076400] [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [497595.083597] [] tgt_enqueue+0x62/0x210 [ptlrpc] [497595.089851] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [497595.096881] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [497595.104679] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [497595.111095] [] kthread+0xd1/0xe0 [497595.116096] [] ret_from_fork_nospec_begin+0xe/0x21 [497595.122657] [] 0xffffffffffffffff [497595.127761] Kernel panic - not syncing: LBUG [497595.132122] CPU: 41 PID: 12324 Comm: mdt01_074 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [497595.144451] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [497595.152106] Call Trace: [497595.154649] [] dump_stack+0x19/0x1b [497595.159882] [] panic+0xe8/0x21f [497595.164763] [] lbug_with_loc+0x9b/0xa0 [libcfs] [497595.171040] [] osp_declare_create+0x5a5/0x5b0 [osp] [497595.177668] [] lod_sub_declare_create+0xdf/0x210 [lod] [497595.184541] [] ? list_del+0xd/0x30 [497595.189693] [] lod_qos_prep_create+0x15d4/0x1890 [lod] [497595.196569] [] ? ___slab_alloc+0x209/0x4f0 [497595.202421] [] ? class_handle_hash+0xab/0x2f0 [obdclass] [497595.209474] [] ? wake_up_state+0x20/0x20 [497595.215152] [] ? lu_buf_alloc+0x48/0x320 [obdclass] [497595.221803] [] ? ldlm_cli_enqueue_local+0x27d/0x870 [ptlrpc] [497595.229208] [] lod_declare_instantiate_components+0x9a/0x1d0 [lod] [497595.237131] [] lod_declare_layout_change+0xb65/0x10f0 [lod] [497595.244442] [] mdd_declare_layout_change+0x62/0x120 [mdd] [497595.251584] [] mdd_layout_change+0x882/0x1000 [mdd] [497595.258213] [] ? mdt_object_lock_internal+0x70/0x3e0 [mdt] [497595.265444] [] mdt_layout_change+0x337/0x430 [mdt] [497595.271978] [] mdt_intent_layout+0x7ee/0xcc0 [mdt] [497595.278543] [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [497595.285083] [] mdt_intent_policy+0x2e8/0xd00 [mdt] [497595.291637] [] ? ldlm_lock_create+0xa4/0xa40 [ptlrpc] [497595.298442] [] ? mdt_intent_open+0x350/0x350 [mdt] [497595.304999] [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [497595.311794] [] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs] [497595.319018] [] ? cfs_hash_add+0xbe/0x1a0 [libcfs] [497595.325490] [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [497595.332662] [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [497595.340268] [] tgt_enqueue+0x62/0x210 [ptlrpc] [497595.346488] [] tgt_request_handle+0xaea/0x1580 [ptlrpc] [497595.353481] [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [497595.361139] [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [497595.368309] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [497595.376083] [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [497595.382959] [] ? default_wake_function+0x12/0x20 [497595.389311] [] ? __wake_up_common+0x5b/0x90 [497595.395263] [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [497595.401650] [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [497595.409132] [] kthread+0xd1/0xe0 [497595.414097] [] ? insert_kthread_work+0x40/0x40 [497595.420279] [] ret_from_fork_nospec_begin+0xe/0x21 [497595.426803] [] ? insert_kthread_work+0x40/0x40