[1648821.507091] Lustre: fir-OST0068: haven't heard from client 71fea641-0f84-41a9-8bea-7a67b49fa0d8 (at 10.51.15.6@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d600b73b800, cur 1689976382 expire 1689976232 last 1689976155 [1648821.528987] Lustre: Skipped 3 previous similar messages [1648832.503831] Lustre: fir-OST006a: haven't heard from client 71fea641-0f84-41a9-8bea-7a67b49fa0d8 (at 10.51.15.6@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6b7a63dc00, cur 1689976393 expire 1689976243 last 1689976166 [1677714.751711] Lustre: fir-OST0068: haven't heard from client fc213d37-140a-40cc-8d6b-a79cfdd6e94f (at 10.50.1.51@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da3db91f400, cur 1690005279 expire 1690005129 last 1690005052 [1677714.773783] Lustre: Skipped 2 previous similar messages [1677728.749671] Lustre: fir-OST006c: haven't heard from client fc213d37-140a-40cc-8d6b-a79cfdd6e94f (at 10.50.1.51@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9cbaa3000, cur 1690005293 expire 1690005143 last 1690005066 [1677728.772200] Lustre: Skipped 1 previous similar message [1677735.751789] Lustre: fir-OST006a: haven't heard from client fc213d37-140a-40cc-8d6b-a79cfdd6e94f (at 10.50.1.51@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d9a7af6e800, cur 1690005300 expire 1690005150 last 1690005073 [1677790.751120] Lustre: fir-OST006e: haven't heard from client b64f50e5-a68e-4cd1-b06c-a15e4c0783b7 (at 10.51.12.2@o2ib3) in 177 seconds. I think it's dead, and I am evicting it. exp ffff9d8ea1234800, cur 1690005355 expire 1690005205 last 1690005178 [1677804.745445] Lustre: fir-OST006c: haven't heard from client b64f50e5-a68e-4cd1-b06c-a15e4c0783b7 (at 10.51.12.2@o2ib3) in 191 seconds. I think it's dead, and I am evicting it. exp ffff9d8ea1237400, cur 1690005369 expire 1690005219 last 1690005178 [1677804.767353] Lustre: Skipped 1 previous similar message [1677866.736881] Lustre: fir-OST006e: haven't heard from client 98648ea4-f716-4d69-a097-57ecf53cd7b4 (at 10.50.1.5@o2ib2) in 194 seconds. I think it's dead, and I am evicting it. exp ffff9dac84067000, cur 1690005431 expire 1690005281 last 1690005237 [1677866.758688] Lustre: Skipped 2 previous similar messages [1677942.728730] Lustre: fir-OST0068: haven't heard from client ec27c7b7-9e89-4277-b2fd-faeaa83e00d0 (at 10.51.12.16@o2ib3) in 177 seconds. I think it's dead, and I am evicting it. exp ffff9da9eda8cc00, cur 1690005507 expire 1690005357 last 1690005330 [1677942.750699] Lustre: Skipped 2 previous similar messages [1677992.717029] Lustre: fir-OST006a: haven't heard from client ec27c7b7-9e89-4277-b2fd-faeaa83e00d0 (at 10.51.12.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9ea29c000, cur 1690005557 expire 1690005407 last 1690005330 [1677992.738985] Lustre: Skipped 1 previous similar message [1679784.500085] Lustre: fir-OST0068: haven't heard from client 38a04cb6-22d9-4094-8107-3b881a8c0d38 (at 10.51.12.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d93379a5c00, cur 1690007349 expire 1690007199 last 1690007122 [1679784.522060] Lustre: Skipped 1 previous similar message [1679796.477678] Lustre: fir-OST006a: haven't heard from client 38a04cb6-22d9-4094-8107-3b881a8c0d38 (at 10.51.12.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9daa062e3c00, cur 1690007361 expire 1690007211 last 1690007134 [1683629.983833] Lustre: fir-OST006e: haven't heard from client 07ce5c45-4b65-4201-84b8-ada6f34ba1e1 (at 10.51.16.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da8441fd800, cur 1690011195 expire 1690011045 last 1690010968 [1683630.006085] Lustre: Skipped 2 previous similar messages [1683644.971634] Lustre: fir-OST0068: haven't heard from client 07ce5c45-4b65-4201-84b8-ada6f34ba1e1 (at 10.51.16.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d990c90bc00, cur 1690011210 expire 1690011060 last 1690010983 [1683648.911464] LustreError: 59300:0:(ldlm_lockd.c:719:ldlm_handle_ast_error()) ### client (nid 10.51.16.16@o2ib3) returned error from glimpse AST (req@ffff9d57a9069200 x1770372654679552 status -107 rc -107), evict it ns: filter-fir-OST006c_UUID lock: ffff9d84ad0cc140/0x2969ebf35840e885 lrc: 3/0,0 mode: PW/PW res: [0x1e00000400:0x3cf1f4b:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) gid 0 flags: 0x40000000000000 nid: 10.51.16.16@o2ib3 remote: 0xfd6eea3ecaa4c97c expref: 6 pid: 46582 timeout: 0 lvb_type: 0 [1683648.957636] LustreError: 138-a: fir-OST006c: A client on nid 10.51.16.16@o2ib3 was evicted due to a lock glimpse callback time out: rc -107 [1683648.970373] LustreError: 58392:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 1683867s: evicting client at 10.51.16.16@o2ib3 ns: filter-fir-OST006c_UUID lock: ffff9d84ad0cc140/0x2969ebf35840e885 lrc: 4/0,0 mode: PW/PW res: [0x1e00000400:0x3cf1f4b:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) gid 0 flags: 0x40000000000000 nid: 10.51.16.16@o2ib3 remote: 0xfd6eea3ecaa4c97c expref: 7 pid: 46582 timeout: 0 lvb_type: 0 [1687135.521728] Lustre: fir-OST006e: haven't heard from client 0591febd-d571-430d-8282-8d411fe201f5 (at 10.51.12.5@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d674ba46c00, cur 1690014701 expire 1690014551 last 1690014474 [1687135.543609] Lustre: Skipped 1 previous similar message [1687150.513288] Lustre: fir-OST006a: haven't heard from client 0591febd-d571-430d-8282-8d411fe201f5 (at 10.51.12.5@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d646af55800, cur 1690014716 expire 1690014566 last 1690014489 [1692278.547066] md: md4: recovery done. [1700345.802390] Lustre: fir-OST006c: haven't heard from client 524b983a-0bed-4055-9a5c-496df3742557 (at 10.51.16.9@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6b3d542800, cur 1690027913 expire 1690027763 last 1690027686 [1700345.824262] Lustre: Skipped 2 previous similar messages [1700353.777096] Lustre: fir-OST006a: haven't heard from client 524b983a-0bed-4055-9a5c-496df3742557 (at 10.51.16.9@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6cb8d19800, cur 1690027921 expire 1690027771 last 1690027694 [1700359.775840] Lustre: fir-OST006e: haven't heard from client 524b983a-0bed-4055-9a5c-496df3742557 (at 10.51.16.9@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6020858400, cur 1690027927 expire 1690027777 last 1690027700 [1700359.797711] Lustre: Skipped 1 previous similar message [1715741.750467] Lustre: fir-OST006c: haven't heard from client 1d3e4a83-da14-41f7-9862-5dc69ffe193e (at 10.51.12.14@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d84e28ad400, cur 1690043311 expire 1690043161 last 1690043084 [1715745.750141] Lustre: fir-OST006e: haven't heard from client 1d3e4a83-da14-41f7-9862-5dc69ffe193e (at 10.51.12.14@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d8d8478dc00, cur 1690043315 expire 1690043165 last 1690043088 [1715750.746349] Lustre: fir-OST0068: haven't heard from client 1d3e4a83-da14-41f7-9862-5dc69ffe193e (at 10.51.12.14@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9daa05971000, cur 1690043320 expire 1690043170 last 1690043093 [1721711.958534] Lustre: fir-OST006a: haven't heard from client d10d241c-9c5a-49f0-8c96-ef4a513f719f (at 10.51.18.19@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6cd8eb1000, cur 1690049282 expire 1690049132 last 1690049055 [1721711.980487] Lustre: Skipped 1 previous similar message [1721716.961404] Lustre: fir-OST006e: haven't heard from client d10d241c-9c5a-49f0-8c96-ef4a513f719f (at 10.51.18.19@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d612d2f3000, cur 1690049287 expire 1690049137 last 1690049060 [1721716.983514] Lustre: Skipped 2 previous similar messages [1724919.533088] Lustre: fir-OST006e: haven't heard from client c8f8a89b-ebb6-42b8-9fcd-e3f89b8d0f2d (at 10.51.12.12@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d9342677400, cur 1690052490 expire 1690052340 last 1690052263 [1743612.060488] Lustre: fir-OST006c: haven't heard from client 02e1b681-ed4c-41e2-a73f-a86eae428c0d (at 10.51.12.3@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d58370ff000, cur 1690071185 expire 1690071035 last 1690070958 [1743612.082356] Lustre: Skipped 3 previous similar messages [1753323.783676] Lustre: fir-OST006a: haven't heard from client 7020a658-98b4-4c2f-8962-1e9837bf96c2 (at 10.50.1.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9daa10928c00, cur 1690080898 expire 1690080748 last 1690080671 [1753323.805456] Lustre: Skipped 3 previous similar messages [1753341.777336] Lustre: fir-OST006c: haven't heard from client 7020a658-98b4-4c2f-8962-1e9837bf96c2 (at 10.50.1.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9daa09dc9c00, cur 1690080916 expire 1690080766 last 1690080689 [1753341.799119] Lustre: Skipped 2 previous similar messages [1762637.547586] Lustre: fir-OST006a: haven't heard from client d3fc51f0-a7bd-4b94-869f-1869e536a489 (at 10.50.7.55@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9e6070c00, cur 1690090213 expire 1690090063 last 1690089986 [1762881.515507] Lustre: fir-OST006a: haven't heard from client 29cde15a-dbed-499e-bcaf-549418232c1c (at 10.51.13.17@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d93395b4800, cur 1690090457 expire 1690090307 last 1690090230 [1762881.537457] Lustre: Skipped 3 previous similar messages [1762889.514141] Lustre: fir-OST0068: haven't heard from client 29cde15a-dbed-499e-bcaf-549418232c1c (at 10.51.13.17@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d9296db0800, cur 1690090465 expire 1690090315 last 1690090238 [1765626.156509] Lustre: fir-OST0068: haven't heard from client b47ac8d9-fcfe-4896-80b7-268b1d599efd (at 10.51.12.2@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d674b2cdc00, cur 1690093202 expire 1690093052 last 1690092975 [1765626.178391] Lustre: Skipped 2 previous similar messages [1765631.157618] Lustre: fir-OST006e: haven't heard from client b47ac8d9-fcfe-4896-80b7-268b1d599efd (at 10.51.12.2@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9daa0997e000, cur 1690093207 expire 1690093057 last 1690092980 [1765631.179494] Lustre: Skipped 1 previous similar message [1765637.167274] Lustre: fir-OST006c: haven't heard from client b47ac8d9-fcfe-4896-80b7-268b1d599efd (at 10.51.12.2@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d83be55c800, cur 1690093213 expire 1690093063 last 1690092986 [1770987.459408] Lustre: fir-OST0068: haven't heard from client d88e50ec-8d26-4c95-b5ca-60b4fd95fb54 (at 10.51.12.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d838c3a7400, cur 1690098564 expire 1690098414 last 1690098337 [1770993.461664] Lustre: fir-OST006a: haven't heard from client d88e50ec-8d26-4c95-b5ca-60b4fd95fb54 (at 10.51.12.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d83a720c000, cur 1690098570 expire 1690098420 last 1690098343 [1771004.445940] Lustre: fir-OST006c: haven't heard from client d88e50ec-8d26-4c95-b5ca-60b4fd95fb54 (at 10.51.12.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6cff326c00, cur 1690098581 expire 1690098431 last 1690098354 [1771625.022648] md: data-check of RAID array md2 [1771631.205759] md: data-check of RAID array md4 [1771637.124006] md: data-check of RAID array md6 [1771643.212506] md: data-check of RAID array md0 [1774523.986200] Lustre: fir-OST006e: haven't heard from client 0c209a59-63b7-47b3-8930-dc99e93478d3 (at 10.51.12.2@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d8391737400, cur 1690102101 expire 1690101951 last 1690101874 [1774524.008066] Lustre: Skipped 1 previous similar message [1775063.728599] LustreError: 63825:0:(ldlm_lockd.c:719:ldlm_handle_ast_error()) ### client (nid 10.50.9.27@o2ib2) returned error from glimpse AST (req@ffff9d8bab5e0000 x1770374557386944 status -107 rc -107), evict it ns: filter-fir-OST006e_UUID lock: ffff9d839021de80/0x2969ebf372942eac lrc: 3/0,0 mode: PW/PW res: [0x1e80000400:0x3c8eaf7:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) gid 0 flags: 0x40000000020000 nid: 10.50.9.27@o2ib2 remote: 0x84c44d4110174d32 expref: 7 pid: 53580 timeout: 0 lvb_type: 0 [1775063.776896] LustreError: 138-a: fir-OST006e: A client on nid 10.50.9.27@o2ib2 was evicted due to a lock glimpse callback time out: rc -107 [1775063.791666] LustreError: 58392:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 1775294s: evicting client at 10.50.9.27@o2ib2 ns: filter-fir-OST006e_UUID lock: ffff9d839021de80/0x2969ebf372942eac lrc: 4/0,0 mode: PW/PW res: [0x1e80000400:0x3c8eaf7:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) gid 0 flags: 0x40000000020000 nid: 10.50.9.27@o2ib2 remote: 0x84c44d4110174d32 expref: 8 pid: 53580 timeout: 0 lvb_type: 0 [1775096.915829] Lustre: fir-OST006c: haven't heard from client 92326ebe-05ba-43b6-817d-1950176a0617 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d94e1452000, cur 1690102674 expire 1690102524 last 1690102447 [1775096.937775] Lustre: Skipped 3 previous similar messages [1775107.912142] Lustre: fir-OST0068: haven't heard from client 92326ebe-05ba-43b6-817d-1950176a0617 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d94e1455400, cur 1690102685 expire 1690102535 last 1690102458 [1780489.213748] Lustre: fir-OST006a: haven't heard from client c2deb207-3f64-4982-8994-d55811ac4a74 (at 10.51.12.6@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9e52a0400, cur 1690108067 expire 1690107917 last 1690107840 [1780489.235998] Lustre: Skipped 1 previous similar message [1780494.212537] Lustre: fir-OST006e: haven't heard from client c2deb207-3f64-4982-8994-d55811ac4a74 (at 10.51.12.6@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da8fa3a1400, cur 1690108072 expire 1690107922 last 1690107845 [1780502.209019] Lustre: fir-OST0068: haven't heard from client c2deb207-3f64-4982-8994-d55811ac4a74 (at 10.51.12.6@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d856f2a9000, cur 1690108080 expire 1690107930 last 1690107853 [1783033.882428] Lustre: fir-OST0068: haven't heard from client ad82afb5-ec8f-46d6-91a1-5cd3ed556a29 (at 10.51.12.5@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d57146d7400, cur 1690110612 expire 1690110462 last 1690110385 [1783033.904323] Lustre: Skipped 1 previous similar message [1783038.878736] Lustre: fir-OST006a: haven't heard from client ad82afb5-ec8f-46d6-91a1-5cd3ed556a29 (at 10.51.12.5@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d57661c0800, cur 1690110617 expire 1690110467 last 1690110390 [1783038.900746] Lustre: Skipped 1 previous similar message [1783046.877896] Lustre: fir-OST006c: haven't heard from client ad82afb5-ec8f-46d6-91a1-5cd3ed556a29 (at 10.51.12.5@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6b647a9800, cur 1690110625 expire 1690110475 last 1690110398 [1785797.521512] Lustre: fir-OST006a: haven't heard from client cffa9d1c-4909-4af8-8ae4-96041ae9269f (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6b71fef400, cur 1690113376 expire 1690113226 last 1690113149 [1785805.530710] Lustre: fir-OST006c: haven't heard from client cffa9d1c-4909-4af8-8ae4-96041ae9269f (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d92a400d400, cur 1690113384 expire 1690113234 last 1690113157 [1785807.522624] Lustre: fir-OST0068: haven't heard from client cffa9d1c-4909-4af8-8ae4-96041ae9269f (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d93d8c0fc00, cur 1690113386 expire 1690113236 last 1690113159 [1799317.771951] Lustre: fir-OST006c: haven't heard from client b01be690-151c-4f4d-bc2b-930ef04691a4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da83a091800, cur 1690126898 expire 1690126748 last 1690126671 [1799317.793899] Lustre: Skipped 1 previous similar message [1799322.764717] Lustre: fir-OST0068: haven't heard from client b01be690-151c-4f4d-bc2b-930ef04691a4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da834713800, cur 1690126903 expire 1690126753 last 1690126676 [1799322.786748] Lustre: Skipped 1 previous similar message [1804313.141401] Lustre: fir-OST006a: haven't heard from client 0ec048d4-acb0-42da-8540-7e8703798c3b (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d9294027800, cur 1690131894 expire 1690131744 last 1690131667 [1804313.163302] Lustre: Skipped 1 previous similar message [1804317.125802] Lustre: fir-OST006e: haven't heard from client 0ec048d4-acb0-42da-8540-7e8703798c3b (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9f5d69400, cur 1690131898 expire 1690131748 last 1690131671 [1804329.124264] Lustre: fir-OST006c: haven't heard from client 0ec048d4-acb0-42da-8540-7e8703798c3b (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9f5d6bc00, cur 1690131910 expire 1690131760 last 1690131683 [1806024.909032] Lustre: fir-OST0068: haven't heard from client a3eccd8a-408c-4f14-aa03-9b4c46485bce (at 10.51.12.2@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d84b6efb400, cur 1690133606 expire 1690133456 last 1690133379 [1806024.930907] Lustre: Skipped 1 previous similar message [1806036.917574] Lustre: fir-OST006c: haven't heard from client a3eccd8a-408c-4f14-aa03-9b4c46485bce (at 10.51.12.2@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6b639e1000, cur 1690133618 expire 1690133468 last 1690133391 [1806036.939601] Lustre: Skipped 2 previous similar messages [1811653.198902] Lustre: fir-OST006e: haven't heard from client f88b117a-2dfb-4020-a1ed-0b7d3c101b51 (at 10.51.5.70@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d73384d9c00, cur 1690139235 expire 1690139085 last 1690139008 [1811658.194814] Lustre: fir-OST006c: haven't heard from client f88b117a-2dfb-4020-a1ed-0b7d3c101b51 (at 10.51.5.70@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d7335978400, cur 1690139240 expire 1690139090 last 1690139013 [1811658.216698] Lustre: Skipped 1 previous similar message [1811663.205207] Lustre: fir-OST006a: haven't heard from client f88b117a-2dfb-4020-a1ed-0b7d3c101b51 (at 10.51.5.70@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6d6f4ffc00, cur 1690139245 expire 1690139095 last 1690139018 [1813355.980532] Lustre: fir-OST006a: haven't heard from client 8e732cec-370e-43b8-b43f-4d04544c675f (at 10.51.12.4@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9f65f3c00, cur 1690140938 expire 1690140788 last 1690140711 [1813361.981048] Lustre: fir-OST006e: haven't heard from client 8e732cec-370e-43b8-b43f-4d04544c675f (at 10.51.12.4@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9f65f2000, cur 1690140944 expire 1690140794 last 1690140717 [1832319.327752] md: md2: data-check done. [1832515.065193] md: md6: data-check done. [1832815.601048] md: md0: data-check done. [1834425.974406] md: md4: data-check done. [1851784.003005] Lustre: fir-OST006a: haven't heard from client c359d0ba-0717-4f6d-8de3-e73278ee09c3 (at 10.51.4.35@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6d68174800, cur 1690179371 expire 1690179221 last 1690179144 [1851784.025161] Lustre: Skipped 2 previous similar messages [1851789.000479] Lustre: fir-OST006e: haven't heard from client c359d0ba-0717-4f6d-8de3-e73278ee09c3 (at 10.51.4.35@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6d74281400, cur 1690179376 expire 1690179226 last 1690179149 [1851789.022500] Lustre: Skipped 1 previous similar message [1851804.004630] Lustre: fir-OST006c: haven't heard from client c359d0ba-0717-4f6d-8de3-e73278ee09c3 (at 10.51.4.35@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d733078a400, cur 1690179391 expire 1690179241 last 1690179164 [1852208.954452] Lustre: fir-OST0068: haven't heard from client 7697e46c-d7b3-469c-8b43-82ec17a8f791 (at 10.51.12.7@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d5837221c00, cur 1690179796 expire 1690179646 last 1690179569 [1852213.950195] Lustre: fir-OST006c: haven't heard from client 7697e46c-d7b3-469c-8b43-82ec17a8f791 (at 10.51.12.7@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6d6ea21800, cur 1690179801 expire 1690179651 last 1690179574 [1852227.404224] LustreError: 53579:0:(ldlm_lockd.c:719:ldlm_handle_ast_error()) ### client (nid 10.51.12.7@o2ib3) returned error from glimpse AST (req@ffff9d83c739d580 x1770375852405248 status -107 rc -107), evict it ns: filter-fir-OST006e_UUID lock: ffff9d565520e9c0/0x2969ebf383a76f9e lrc: 3/0,0 mode: PW/PW res: [0x1e80000400:0x3ca30ec:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) gid 0 flags: 0x40000000020000 nid: 10.51.12.7@o2ib3 remote: 0xf3cbcf8ef7f07907 expref: 9 pid: 42414 timeout: 0 lvb_type: 0 [1852227.450189] LustreError: 53579:0:(ldlm_lockd.c:719:ldlm_handle_ast_error()) Skipped 1 previous similar message [1852227.460353] LustreError: 138-a: fir-OST006e: A client on nid 10.51.12.7@o2ib3 was evicted due to a lock glimpse callback time out: rc -107 [1852227.473085] LustreError: Skipped 1 previous similar message [1852227.479095] LustreError: 58392:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 1852468s: evicting client at 10.51.12.7@o2ib3 ns: filter-fir-OST006e_UUID lock: ffff9d565520e9c0/0x2969ebf383a76f9e lrc: 3/0,0 mode: PW/PW res: [0x1e80000400:0x3ca30ec:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4095) gid 0 flags: 0x40000000020000 nid: 10.51.12.7@o2ib3 remote: 0xf3cbcf8ef7f07907 expref: 10 pid: 42414 timeout: 0 lvb_type: 0 [1866986.997944] Lustre: fir-OST0068: haven't heard from client c578de58-f675-4b6b-98fd-d5e607ed5c24 (at 10.51.12.8@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da3eea11c00, cur 1690194576 expire 1690194426 last 1690194349 [1866987.020003] Lustre: Skipped 1 previous similar message [1866991.998129] Lustre: fir-OST006a: haven't heard from client c578de58-f675-4b6b-98fd-d5e607ed5c24 (at 10.51.12.8@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d9414bdb800, cur 1690194581 expire 1690194431 last 1690194354 [1866992.020118] Lustre: Skipped 1 previous similar message [1884194.731395] Lustre: fir-OST0068: haven't heard from client 0899019e-119d-4434-8266-88a936bde75d (at 10.51.6.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9db32cc7b400, cur 1690211786 expire 1690211636 last 1690211559 [1884194.753314] Lustre: Skipped 1 previous similar message [1884200.738751] Lustre: fir-OST006e: haven't heard from client 0899019e-119d-4434-8266-88a936bde75d (at 10.51.6.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9db291335c00, cur 1690211792 expire 1690211642 last 1690211565 [1884200.760694] Lustre: Skipped 1 previous similar message [1891808.741695] Lustre: fir-OST006a: haven't heard from client f2ec5542-8483-41b7-9186-bbd4eb58531a (at 10.51.12.12@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9dfb4a800, cur 1690219401 expire 1690219251 last 1690219174 [1891808.763646] Lustre: Skipped 1 previous similar message [1891819.730033] Lustre: fir-OST006c: haven't heard from client f2ec5542-8483-41b7-9186-bbd4eb58531a (at 10.51.12.12@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9daf9bc16c00, cur 1690219412 expire 1690219262 last 1690219185 [1891828.725758] Lustre: fir-OST0068: haven't heard from client f2ec5542-8483-41b7-9186-bbd4eb58531a (at 10.51.12.12@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d838b560800, cur 1690219421 expire 1690219271 last 1690219194 [1892716.609774] Lustre: fir-OST006a: haven't heard from client f6fd225a-3fa5-4e2d-9ea0-d5e1611d3c21 (at 10.51.12.5@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d584c765000, cur 1690220309 expire 1690220159 last 1690220082 [1892716.631644] Lustre: Skipped 1 previous similar message [1900842.537316] Lustre: fir-OST006a: haven't heard from client 4ba1af0c-a36d-47c8-8c7b-f5e721bc2aa8 (at 10.51.15.9@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9da9f8dae400, cur 1690228436 expire 1690228286 last 1690228209 [1900842.559392] Lustre: Skipped 3 previous similar messages [1900854.531116] Lustre: fir-OST006c: haven't heard from client 4ba1af0c-a36d-47c8-8c7b-f5e721bc2aa8 (at 10.51.15.9@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9daa073d4400, cur 1690228448 expire 1690228298 last 1690228221 [1902253.346422] Lustre: fir-OST006e: haven't heard from client e80e2ea6-be27-4d84-974e-e7f65f45938f (at 10.51.8.7@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6b7193e800, cur 1690229847 expire 1690229697 last 1690229620 [1902253.368206] Lustre: Skipped 2 previous similar messages [1902259.344380] Lustre: fir-OST006c: haven't heard from client e80e2ea6-be27-4d84-974e-e7f65f45938f (at 10.51.8.7@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6b7908b800, cur 1690229853 expire 1690229703 last 1690229626 [1902259.366167] Lustre: Skipped 1 previous similar message [1903293.206006] Lustre: fir-OST006c: haven't heard from client 1f111d09-8ecc-49f7-be37-bc650177f85c (at 10.51.2.50@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9db33e953400, cur 1690230887 expire 1690230737 last 1690230660 [1903293.227874] Lustre: Skipped 1 previous similar message [1903651.158729] Lustre: fir-OST006a: haven't heard from client 8242ca14-a37e-4191-a6c3-9f1ca886445c (at 10.51.0.65@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d5c37aaa800, cur 1690231245 expire 1690231095 last 1690231018 [1903651.180603] Lustre: Skipped 3 previous similar messages [1905754.880886] Lustre: fir-OST006e: haven't heard from client b7f3444f-8519-4760-84e8-cde6c7f02fe7 (at 10.51.1.72@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d7119691800, cur 1690233349 expire 1690233199 last 1690233122 [1905754.902765] Lustre: Skipped 3 previous similar messages [1909777.348519] Lustre: fir-OST006c: haven't heard from client 0d29f22f-8981-4ffe-9cf6-96139efb35ff (at 10.51.12.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9daa06adfc00, cur 1690237372 expire 1690237222 last 1690237145 [1909777.370476] Lustre: Skipped 3 previous similar messages [1909787.348082] Lustre: fir-OST0068: haven't heard from client 0d29f22f-8981-4ffe-9cf6-96139efb35ff (at 10.51.12.16@o2ib3) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d84cac57800, cur 1690237382 expire 1690237232 last 1690237155 [1909787.370497] Lustre: Skipped 1 previous similar message [1915198.637840] Lustre: fir-OST006a: haven't heard from client 3d7da4d0-2d9c-49ec-86af-5029ec98ef3d (at 10.50.1.59@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9db292086000, cur 1690242794 expire 1690242644 last 1690242567 [1915198.659716] Lustre: Skipped 1 previous similar message [1915205.627166] Lustre: fir-OST006e: haven't heard from client 3d7da4d0-2d9c-49ec-86af-5029ec98ef3d (at 10.50.1.59@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9db32d45dc00, cur 1690242801 expire 1690242651 last 1690242574 [1915205.649032] Lustre: Skipped 1 previous similar message [1915216.624653] Lustre: fir-OST006c: haven't heard from client 3d7da4d0-2d9c-49ec-86af-5029ec98ef3d (at 10.50.1.59@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9d6d6f06a000, cur 1690242812 expire 1690242662 last 1690242585 [1916266.720320] Lustre: fir-OST006a: Client ad3a3cde-6aba-439a-b244-c139428a7aa5 (at 10.51.9.57@o2ib3) reconnecting [1916266.730576] Lustre: Skipped 1 previous similar message [1916272.976562] LustreError: 26712:0:(events.c:476:server_bulk_callback()) event type 5, status -5, desc ffff9d64b940c800 [1916272.977062] LNetError: 26687:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.232@o2ib7 has gone from up to down [1916273.000195] LustreError: 26712:0:(events.c:476:server_bulk_callback()) Skipped 18 previous similar messages [1916274.305857] LNetError: 61189:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.232@o2ib7) recovery failed with -113 [1916278.518778] Lustre: fir-OST006c: Client 39ea96dd-bf03-478b-9c67-76ec583d3a52 (at 10.51.7.21@o2ib3) reconnecting [1916278.529039] Lustre: Skipped 72 previous similar messages [1916288.303977] LNetError: 61189:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.232@o2ib7) recovery failed with -113 [1916288.315967] LNetError: 61189:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 2 previous similar messages [1916301.004330] Lustre: fir-OST0068: Client a8d50895-5fcb-4609-8cad-942f0f673040 (at 10.51.11.6@o2ib3) reconnecting [1916308.892241] LustreError: 137-5: fir-OST0069_UUID: not available for connect from 10.51.14.24@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1916308.909775] LustreError: Skipped 1 previous similar message [1916313.300492] LustreError: 47327:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9d56386bd100 x1771637566721600/t0(0) o3->fe67f592-b22b-4874-8a2e-8c5b342db4a5@10.51.18.1@o2ib3:501/0 lens 488/440 e 0 to 0 dl 1690243916 ref 1 fl Interpret:/0/0 rc 0/0 job:'25059984' [1916313.302355] Lustre: fir-OST006c: Bulk IO read error with 3c210b3c-77bb-428d-98d0-97ac8d3fb33b (at 10.51.12.10@o2ib3), client will retry: rc -110 [1916313.302356] Lustre: Skipped 13 previous similar messages [1916313.345009] LustreError: 47327:0:(ldlm_lib.c:3550:target_bulk_io()) Skipped 13 previous similar messages [1916336.300661] LNetError: 61811:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.232@o2ib7) recovery failed with -113 [1916336.312654] LNetError: 61811:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message [1916375.088273] Lustre: 46824:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690243842/real 1690243842] req@ffff9d564687d100 x1770377209953984/t0(0) o106->fir-OST0068@10.51.8.38@o2ib3:15/16 lens 328/280 e 0 to 1 dl 1690243970 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1916375.088274] Lustre: 63834:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690243842/real 1690243842] req@ffff9d82dcc59f80 x1770377209954176/t0(0) o106->fir-OST0068@10.51.1.17@o2ib3:15/16 lens 328/280 e 0 to 1 dl 1690243970 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1916375.195691] Lustre: fir-OST006a: Client 3c210b3c-77bb-428d-98d0-97ac8d3fb33b (at 10.51.12.10@o2ib3) reconnecting [1916375.206037] Lustre: Skipped 10 previous similar messages [1916376.195989] LustreError: 132-0: fir-OST006a: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.51.12.10@o2ib3 inode [0x2c006dd1d:0x30ae:0x0] object 0x1d80000402:295064191 extent [125829120-134217727], client returned csum 0 (type 20), server csum 8161f0e2 (type 20) [1916379.965571] LustreError: 132-0: fir-OST0068: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.51.11.16@o2ib3 inode [0x20005adde:0x1679a:0x0] object 0x0:10501143 extent [0-8191], client returned csum 0 (type 10), server csum 7e3114b (type 10) [1916400.301520] LNetError: 1533:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.232@o2ib7) recovery failed with -113 [1916528.302212] LNetError: 1533:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.232@o2ib7) recovery failed with -113 [1916784.301160] LNetError: 1533:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.232@o2ib7) recovery failed with -113 [1917296.311447] LNetError: 61189:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.232@o2ib7) recovery failed with -113 [1917475.342661] LNetError: 26711:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.232@o2ib7 has gone from down to up [1918078.248087] Lustre: fir-OST006a: Client 338bc677-8b21-4304-b7e7-b39fece90048 (at 10.51.6.44@o2ib3) reconnecting [1918078.248088] Lustre: fir-OST006c: Client 338bc677-8b21-4304-b7e7-b39fece90048 (at 10.51.6.44@o2ib3) reconnecting [1918078.248089] Lustre: Skipped 4 previous similar messages [1918078.623584] LustreError: 8056:0:(ldlm_lib.c:3544:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff9d563c309b00 x1771846044784448/t0(0) o3->338bc677-8b21-4304-b7e7-b39fece90048@10.51.6.44@o2ib3:67/0 lens 488/440 e 0 to 0 dl 1690245747 ref 1 fl Interpret:/0/0 rc 0/0 job:'25102265' [1918083.061638] LustreError: 26716:0:(events.c:476:server_bulk_callback()) event type 5, status -5, desc ffff9d921c19f000 [1918083.072437] LustreError: 26716:0:(events.c:476:server_bulk_callback()) Skipped 4 previous similar messages [1918083.072641] LNetError: 62997:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib2 through 10.0.10.225@o2ib7 has gone from up to down [1918083.096096] Lustre: fir-OST006c: Bulk IO read error with 338bc677-8b21-4304-b7e7-b39fece90048 (at 10.51.6.44@o2ib3), client will retry: rc -110 [1918083.109280] Lustre: Skipped 4 previous similar messages [1918083.348171] LNetError: 62997:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.225@o2ib7) recovery failed with -113 [1918087.636277] Lustre: fir-OST0068: Client 3cfc4cca-1c79-45cf-9ac8-4f29a97f2238 (at 10.50.9.57@o2ib2) reconnecting [1918087.646530] Lustre: Skipped 490 previous similar messages [1918107.818406] Lustre: fir-OST006a: Client 2ef68ee4-9c1e-4254-a992-7be8fe0d83bd (at 10.50.5.33@o2ib2) reconnecting [1918107.828671] Lustre: Skipped 114 previous similar messages [1918109.455459] LustreError: 137-5: fir-OST006d_UUID: not available for connect from 10.50.8.44@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1918112.345168] LustreError: 8154:0:(sec.c:2543:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 5905437(11148317) req@ffff9d561acf1200 x1770439509320768/t0(0) o4->f36d8f96-fa40-46b4-b805-ce0cb7af1889@10.50.6.67@o2ib2:47/0 lens 488/448 e 0 to 0 dl 1690245727 ref 1 fl Interpret:/0/0 rc 0/0 job:'19298441' [1918112.345174] LustreError: 8158:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk WRITE req@ffff9d8cc94c7080 x1770651964325504/t0(0) o4->b7fcf141-7247-4663-a17e-153c4e20b14e@10.50.10.45@o2ib2:47/0 lens 488/448 e 0 to 0 dl 1690245727 ref 1 fl Interpret:/0/0 rc 0/0 job:'25057436' [1918112.345181] Lustre: fir-OST006a: Bulk IO write error with b7fcf141-7247-4663-a17e-153c4e20b14e (at 10.50.10.45@o2ib2), client will retry: rc = -110 [1918112.345182] Lustre: Skipped 12 previous similar messages [1918112.416768] LustreError: 8154:0:(sec.c:2543:sptlrpc_svc_unwrap_bulk()) Skipped 13 previous similar messages [1918112.671825] LustreError: 137-5: fir-OST006d_UUID: not available for connect from 10.50.8.25@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1918116.160956] LustreError: 137-5: fir-OST006d_UUID: not available for connect from 10.50.1.39@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1918116.178404] LustreError: Skipped 1 previous similar message [1918116.740573] LNetError: 26687:0:(o2iblnd_cb.c:3375:kiblnd_check_txs_locked()) Timed out tx: tx_queue(WSQ:001), 58 seconds [1918116.751608] LNetError: 26687:0:(o2iblnd_cb.c:3375:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [1918116.762105] LNetError: 26687:0:(o2iblnd_cb.c:3445:kiblnd_check_conns()) Timed out RDMA with 10.0.10.226@o2ib7 (58): c: 0, oc: 0, rc: 1 [1918116.774344] LNetError: 26687:0:(o2iblnd_cb.c:3445:kiblnd_check_conns()) Skipped 4 previous similar messages [1918116.785703] LustreError: 26687:0:(events.c:476:server_bulk_callback()) event type 5, status -103, desc ffff9d94a54b7800 [1918116.796850] LustreError: 26687:0:(events.c:476:server_bulk_callback()) Skipped 4 previous similar messages [1918116.806755] Lustre: fir-OST006e: Bulk IO read error with 9c562b8b-ebaf-4071-8431-9d968ad1c42f (at 10.50.9.32@o2ib2), client will retry: rc -110 [1918116.806797] LNetError: 26687:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib2 through 10.0.10.226@o2ib7 has gone from up to down [1918116.806798] LNetError: 26687:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) Skipped 2 previous similar messages [1918116.842991] Lustre: Skipped 2 previous similar messages [1918118.758203] LustreError: 137-5: fir-OST006d_UUID: not available for connect from 10.51.4.72@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1918124.448138] LustreError: 137-5: fir-OST006f_UUID: not available for connect from 10.51.7.41@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1918124.465587] LustreError: Skipped 4 previous similar messages [1918132.614273] LustreError: 137-5: fir-OST006d_UUID: not available for connect from 10.50.4.32@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1918132.631722] LustreError: Skipped 9 previous similar messages [1918136.342027] LustreError: 8104:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9d56260ae780 x1771846185224960/t0(0) o3->a8d50895-5fcb-4609-8cad-942f0f673040@10.51.11.6@o2ib3:69/0 lens 488/440 e 0 to 0 dl 1690245749 ref 1 fl Interpret:/0/0 rc 0/0 job:'20312417' [1918136.342079] Lustre: fir-OST006a: Bulk IO read error with a8d50895-5fcb-4609-8cad-942f0f673040 (at 10.51.11.6@o2ib3), client will retry: rc -110 [1918136.381045] LustreError: 8104:0:(ldlm_lib.c:3550:target_bulk_io()) Skipped 7 previous similar messages [1918148.735635] Lustre: fir-OST006a: Client 280c0b23-0cf1-4e7b-aec6-d73a3d50ac10 (at 10.50.3.19@o2ib2) reconnecting [1918148.745890] Lustre: Skipped 281 previous similar messages [1918160.341136] LustreError: 8138:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9d560b11f980 x1770425580310912/t0(0) o3->0e900437-40ea-4d6b-85f0-91e49a945d6f@10.50.5.65@o2ib2:84/0 lens 536/440 e 0 to 0 dl 1690245764 ref 1 fl Interpret:/0/0 rc 0/0 job:'25052477' [1918160.366739] LustreError: 8138:0:(ldlm_lib.c:3550:target_bulk_io()) Skipped 5 previous similar messages [1918186.791385] Lustre: 46646:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690245654/real 1690245654] req@ffff9d561cfb7980 x1770377233907968/t0(0) o104->fir-OST006e@10.50.10.61@o2ib2:15/16 lens 328/224 e 0 to 1 dl 1690245782 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1918186.819677] Lustre: 46646:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [1918187.457170] LustreError: 132-0: fir-OST006a: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.51.16.2@o2ib3 inode [0x28006ef8c:0x1d9cf:0x0] object 0x1d80000401:72493408 extent [0-8191], client returned csum 0 (type 20), server csum 61a0297 (type 20) [1918197.729942] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.225@o2ib7: 4 seconds [1918197.740285] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Skipped 9 previous similar messages [1918223.777223] Lustre: fir-OST006a: Client 0e900437-40ea-4d6b-85f0-91e49a945d6f (at 10.50.5.65@o2ib2) reconnecting [1918223.787475] Lustre: Skipped 31 previous similar messages [1918250.511144] LNetError: 26712:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib2 through 10.0.10.226@o2ib7 has gone from down to up [1918572.348393] Lustre: fir-OST006e: Client db68f1fb-88b0-46aa-8259-297200f1e573 (at 10.50.5.30@o2ib2) reconnecting [1918572.358656] Lustre: Skipped 1 previous similar message [1918572.534384] LustreError: 137-5: fir-OST006b_UUID: not available for connect from 10.50.15.10@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1918572.551938] LustreError: Skipped 9 previous similar messages [1918574.496065] LustreError: 8097:0:(ldlm_lib.c:3544:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff9d562ae27080 x1770426242908992/t0(0) o3->f37a2896-85e3-4982-b7f2-87755f0c7549@10.50.5.43@o2ib2:617/0 lens 488/440 e 0 to 0 dl 1690246297 ref 1 fl Interpret:/0/0 rc 0/0 job:'25052479' [1918574.521600] LustreError: 8097:0:(ldlm_lib.c:3544:target_bulk_io()) Skipped 3 previous similar messages [1918602.676367] LNetError: 26687:0:(o2iblnd_cb.c:3375:kiblnd_check_txs_locked()) Timed out tx: tx_queue(WSQ:101), 54 seconds [1918602.687604] LNetError: 26687:0:(o2iblnd_cb.c:3445:kiblnd_check_conns()) Timed out RDMA with 10.0.10.226@o2ib7 (54): c: 0, oc: 0, rc: 0 [1918602.700592] LustreError: 26687:0:(events.c:476:server_bulk_callback()) event type 5, status -103, desc ffff9d83b6383400 [1918602.711828] LustreError: 26687:0:(events.c:476:server_bulk_callback()) Skipped 20 previous similar messages [1918602.721942] Lustre: fir-OST006c: Bulk IO read error with f37a2896-85e3-4982-b7f2-87755f0c7549 (at 10.50.5.43@o2ib2), client will retry: rc -110 [1918602.722595] LNetError: 26664:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib2 through 10.0.10.226@o2ib7 has gone from up to down [1918602.748025] Lustre: Skipped 10 previous similar messages [1918616.340584] LustreError: 8158:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9d562c73a400 x1770433055749056/t0(0) o3->fbdbce8c-d0c5-404c-ba6a-9493fc356b72@10.50.6.23@o2ib2:544/0 lens 488/440 e 0 to 0 dl 1690246224 ref 1 fl Interpret:/0/0 rc 0/0 job:'25102511' [1918616.341791] LustreError: 8083:0:(sec.c:2543:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 8388608(12582912) req@ffff9d8255254c80 x1771637160091904/t0(0) o4->f141c6bb-1cd9-4dfb-89a5-cc141fef9639@10.50.16.10@o2ib2:591/0 lens 488/448 e 0 to 0 dl 1690246271 ref 1 fl Interpret:/0/0 rc 0/0 job:'19298444' [1918616.341864] Lustre: fir-OST0068: Bulk IO write error with f141c6bb-1cd9-4dfb-89a5-cc141fef9639 (at 10.50.16.10@o2ib2), client will retry: rc = -110 [1918616.341865] Lustre: Skipped 6 previous similar messages [1918616.412181] LustreError: 8158:0:(ldlm_lib.c:3550:target_bulk_io()) Skipped 5 previous similar messages [1918676.819613] Lustre: 59300:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690246144/real 1690246144] req@ffff9d8256744c80 x1770377239485056/t0(0) o106->fir-OST006e@10.50.12.16@o2ib2:15/16 lens 328/280 e 0 to 1 dl 1690246272 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1918676.847909] Lustre: 59300:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [1918677.831513] Lustre: 64049:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690246145/real 1690246145] req@ffff9d560ca41200 x1770377239495680/t0(0) o106->fir-OST006e@10.50.1.56@o2ib2:15/16 lens 328/280 e 0 to 1 dl 1690246273 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1918729.341979] LNetError: 1533:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.226@o2ib7) recovery failed with -113 [1918729.353886] LNetError: 1533:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 39 previous similar messages [1919297.536448] LNetError: 26711:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.234@o2ib7 has gone from down to up [1919298.584125] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.225@o2ib7: 11 seconds [1919298.594553] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Skipped 1 previous similar message [1919317.418290] LNetError: 26716:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.233@o2ib7 has gone from down to up [1919349.577368] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.225@o2ib7: 2 seconds [1919413.568883] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.225@o2ib7: 6 seconds [1919481.346497] Lustre: fir-OST006c: Client 0c4300e8-c98e-45fe-90cd-0edeedbaa459 (at 10.51.13.13@o2ib3) reconnecting [1919481.356841] Lustre: Skipped 392 previous similar messages [1919481.541614] LustreError: 137-5: fir-OST006b_UUID: not available for connect from 10.51.6.7@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1919481.558976] LustreError: Skipped 5 previous similar messages [1919485.179149] LustreError: 26711:0:(events.c:476:server_bulk_callback()) event type 5, status -5, desc ffff9d6cf33f9800 [1919485.179524] LNetError: 26664:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.234@o2ib7 has gone from up to down [1919485.202783] LustreError: 26711:0:(events.c:476:server_bulk_callback()) Skipped 15 previous similar messages [1919486.334460] LNetError: 61811:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.234@o2ib7) recovery failed with -113 [1919486.346449] LNetError: 61811:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 5 previous similar messages [1919528.328665] LustreError: 47298:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9d561a46c800 x1771660041973440/t0(0) o3->74727e22-5100-4fe0-af70-9d8c0c4f6117@10.51.6.67@o2ib3:696/0 lens 488/440 e 0 to 0 dl 1690247131 ref 1 fl Interpret:/0/0 rc 0/0 job:'24822442' [1919528.330583] Lustre: fir-OST006e: Bulk IO read error with af2e845a-09cd-4ed7-a76f-8648398b5e1e (at 10.51.4.42@o2ib3), client will retry: rc -110 [1919528.330584] Lustre: Skipped 6 previous similar messages [1919528.372958] LustreError: 47298:0:(ldlm_lib.c:3550:target_bulk_io()) Skipped 1 previous similar message [1919589.937478] Lustre: 46743:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690247057/real 1690247057] req@ffff9d824ed54800 x1770377251948736/t0(0) o106->fir-OST006e@10.51.1.1@o2ib3:15/16 lens 328/280 e 0 to 1 dl 1690247185 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1919589.965580] Lustre: 46743:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 1 previous similar message [1919590.056855] LustreError: 132-0: fir-OST006a: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.51.16.2@o2ib3 inode [0x2c006d705:0x11921:0x0] object 0x1d80000402:290884061 extent [0-8191], client returned csum 0 (type 20), server csum 3dd013a (type 20) [1919699.689250] LNetError: 26715:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib2 through 10.0.10.225@o2ib7 has gone from down to up [1920005.471296] LNetError: 26712:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib2 through 10.0.10.226@o2ib7 has gone from down to up [1920508.320926] LNetError: 61811:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.234@o2ib7) recovery failed with -113 [1920508.332916] LNetError: 61811:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 9 previous similar messages [1920715.422439] LNetError: 26714:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.234@o2ib7 has gone from down to up [1921345.980824] Lustre: fir-OST006e: Client c659e398-fd4a-4afb-9191-c752140c1bf8 (at 10.51.9.64@o2ib3) reconnecting [1921345.991116] Lustre: Skipped 70 previous similar messages [1921347.626447] LustreError: 8072:0:(ldlm_lib.c:3544:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff9d8ba459da00 x1770420757913088/t0(0) o3->0747785f-2d93-4b25-b412-6fce44bc0833@10.51.5.61@o2ib3:316/0 lens 488/440 e 0 to 0 dl 1690249016 ref 1 fl Interpret:/0/0 rc 0/0 job:'25052392' [1921348.227444] Lustre: fir-OST006a: Bulk IO write error with 0c4300e8-c98e-45fe-90cd-0edeedbaa459 (at 10.51.13.13@o2ib3), client will retry: rc = -110 [1921351.089960] LustreError: 26711:0:(events.c:476:server_bulk_callback()) event type 5, status -5, desc ffff9d84ad7d8000 [1921351.090004] Lustre: 24363:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1690248922/real 1690248947] req@ffff9d82023ca880 x1770377278283264/t0(0) o106->fir-OST0068@10.51.8.38@o2ib3:15/16 lens 328/280 e 0 to 1 dl 1690249050 ref 1 fl Rpc:eXQr/0/ffffffff rc 0/-1 job:'' [1921351.090092] LNetError: 26664:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.236@o2ib7 has gone from up to down [1921351.090188] Lustre: fir-OST0068: Bulk IO read error with 0747785f-2d93-4b25-b412-6fce44bc0833 (at 10.51.5.61@o2ib3), client will retry: rc -110 [1921351.090189] Lustre: Skipped 1 previous similar message [1921351.160535] LustreError: 26711:0:(events.c:476:server_bulk_callback()) Skipped 20 previous similar messages [1921352.316122] LNetError: 4189:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.227@o2ib7) recovery failed with -113 [1921376.312674] LustreError: 8128:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9d82c75fcc80 x1771640378699712/t0(0) o3->09e7f960-f848-4ceb-b73b-ca8aaba50c92@10.51.14.9@o2ib3:295/0 lens 488/440 e 0 to 0 dl 1690248995 ref 1 fl Interpret:/0/0 rc 0/0 job:'' [1921376.312736] Lustre: fir-OST006a: Bulk IO read error with 6cdb87ed-fb72-4da8-8b6e-e0a7fc6edb62 (at 10.50.10.6@o2ib2), client will retry: rc -110 [1921376.312738] Lustre: Skipped 1 previous similar message [1921376.356001] LustreError: 8128:0:(ldlm_lib.c:3550:target_bulk_io()) Skipped 1 previous similar message [1921378.038176] LustreError: 137-5: fir-OST006d_UUID: not available for connect from 10.50.1.66@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1921378.055638] LustreError: Skipped 2 previous similar messages [1921381.410458] LustreError: 137-5: fir-OST006b_UUID: not available for connect from 10.50.1.52@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1921386.289423] LustreError: 137-5: fir-OST0069_UUID: not available for connect from 10.51.6.29@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1921388.797272] LustreError: 137-5: fir-OST0069_UUID: not available for connect from 10.51.7.51@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1921388.814852] LustreError: Skipped 2 previous similar messages [1921392.809025] LustreError: 137-5: fir-OST006b_UUID: not available for connect from 10.51.13.10@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1921392.826563] LustreError: Skipped 2 previous similar messages [1921400.309494] LustreError: 57463:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9d82571c3600 x1770431078051712/t0(0) o3->cff55eb0-0e53-4b6e-a68a-9c618873265f@10.50.5.52@o2ib2:322/0 lens 488/440 e 1 to 0 dl 1690249022 ref 1 fl Interpret:/0/0 rc 0/0 job:'25070664' [1921400.309569] Lustre: fir-OST0068: Bulk IO read error with fe2092f1-d21d-4870-97e4-e7ac72627b76 (at 10.50.13.8@o2ib2), client will retry: rc -110 [1921400.309570] Lustre: Skipped 1 previous similar message [1921400.353596] LustreError: 57463:0:(ldlm_lib.c:3550:target_bulk_io()) Skipped 5 previous similar messages [1921400.971848] LustreError: 137-5: fir-OST006f_UUID: not available for connect from 10.50.8.11@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1921400.989303] LustreError: Skipped 17 previous similar messages [1921415.303487] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.224@o2ib7: 1 seconds [1921423.627093] Lustre: fir-OST0068: Client fac1ad2b-3bd5-459a-a792-71b5fb6f1068 (at 10.50.0.64@o2ib2) reconnecting [1921423.637560] Lustre: Skipped 871 previous similar messages [1921430.656444] LustreError: 58392:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.50.0.64@o2ib2 ns: filter-fir-OST006a_UUID lock: ffff9d84aacd7500/0x2969ebf396ae4886 lrc: 3/0,0 mode: PR/PR res: [0x1045df9:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->16777215) gid 0 flags: 0x60000400010020 nid: 10.50.0.64@o2ib2 remote: 0x15ebfb07af220a3b expref: 43 pid: 46734 timeout: 1921680 lvb_type: 1 [1921430.697543] LustreError: 58392:0:(ldlm_lockd.c:259:expired_lock_main()) Skipped 1 previous similar message [1921453.298446] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.224@o2ib7: 0 seconds [1921454.196322] Lustre: 63828:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690248922/real 1690248922] req@ffff9dadd397ad00 x1770377278283072/t0(0) o106->fir-OST006c@10.51.8.48@o2ib3:15/16 lens 328/280 e 0 to 1 dl 1690249050 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1921454.224513] Lustre: 63828:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 1 previous similar message [1921455.622134] Lustre: 58403:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690248923/real 1690248923] req@ffff9d82c4f30900 x1770377278294336/t0(0) o106->fir-OST006c@10.50.1.17@o2ib2:15/16 lens 328/280 e 0 to 1 dl 1690249051 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1921455.650326] Lustre: 58403:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 13 previous similar messages [1921468.323447] Lustre: 22454:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690248936/real 1690248936] req@ffff9d82020cbf00 x1770377278327424/t0(0) o104->fir-OST0068@10.50.5.30@o2ib2:15/16 lens 328/224 e 0 to 1 dl 1690249064 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1921484.052741] LNetError: 26716:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib2 through 10.0.10.227@o2ib7 has gone from down to up [1921488.715420] LNetError: 26711:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib2 through 10.0.10.224@o2ib7 has gone from down to up [1922374.318118] LNetError: 4189:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.236@o2ib7) recovery failed with -113 [1922374.330022] LNetError: 4189:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 30 previous similar messages [1922557.734866] LNetError: 26713:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.236@o2ib7 has gone from down to up [1922577.885157] LNetError: 26715:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.235@o2ib7 has gone from down to up [1925926.167878] Lustre: fir-OST0068: Client 8da7b63b-e5e0-4793-a214-de2ad0bd0382 (at 10.51.1.30@o2ib3) reconnecting [1925926.178134] Lustre: Skipped 26 previous similar messages [1925927.612154] LustreError: 8160:0:(ldlm_lib.c:3544:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff9d8baa3db180 x1770418936755008/t0(0) o3->f2833009-a3c3-4ca8-931d-e3f082378209@10.51.2.27@o2ib3:367/0 lens 488/440 e 0 to 0 dl 1690253597 ref 1 fl Interpret:/0/0 rc 0/0 job:'25085738' [1925927.637706] LustreError: 8160:0:(ldlm_lib.c:3544:target_bulk_io()) Skipped 2 previous similar messages [1925931.054898] LustreError: 26711:0:(events.c:476:server_bulk_callback()) event type 5, status -5, desc ffff9d575437e800 [1925931.055166] Lustre: fir-OST0068: Bulk IO read error with f2833009-a3c3-4ca8-931d-e3f082378209 (at 10.51.2.27@o2ib3), client will retry: rc -110 [1925931.055167] Lustre: Skipped 5 previous similar messages [1925931.055182] LNetError: 26687:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.238@o2ib7 has gone from up to down [1925931.055184] LNetError: 26687:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) Skipped 3 previous similar messages [1925931.107470] LustreError: 26711:0:(events.c:476:server_bulk_callback()) Skipped 32 previous similar messages [1925931.302959] LNetError: 10471:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.238@o2ib7) recovery failed with -113 [1925931.315077] LNetError: 10471:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 1 previous similar message [1925931.510617] LustreError: 8177:0:(ldlm_lib.c:3544:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff9d562f2a1200 x1770434517813440/t0(0) o3->b2fb6051-c09d-4215-8543-69821d22238f@10.50.10.63@o2ib2:370/0 lens 488/440 e 0 to 0 dl 1690253600 ref 1 fl Interpret:/0/0 rc 0/0 job:'19277418' [1925931.532410] LustreError: 137-5: fir-OST0069_UUID: not available for connect from 10.50.1.49@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1925931.532411] LustreError: Skipped 4 previous similar messages [1925931.664627] Lustre: fir-OST0068: Bulk IO read error with cff55eb0-0e53-4b6e-a68a-9c618873265f (at 10.50.5.52@o2ib2), client will retry: rc -110 [1925931.677659] Lustre: Skipped 1 previous similar message [1925932.581476] LustreError: 8128:0:(ldlm_lib.c:3544:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffff9d8202ad0480 x1770435879312448/t0(0) o4->66359a2a-ee4d-450e-8e29-b23a791e4a1b@10.50.8.23@o2ib2:371/0 lens 488/448 e 0 to 0 dl 1690253601 ref 1 fl Interpret:/0/0 rc 0/0 job:'19298448' [1925932.606907] LustreError: 8128:0:(ldlm_lib.c:3544:target_bulk_io()) Skipped 1 previous similar message [1925932.616351] Lustre: fir-OST006c: Bulk IO write error with 66359a2a-ee4d-450e-8e29-b23a791e4a1b (at 10.50.8.23@o2ib2), client will retry: rc = -110 [1925933.756518] LustreError: 137-5: fir-OST006d_UUID: not available for connect from 10.50.9.35@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1925933.773976] LustreError: Skipped 2 previous similar messages [1925942.140492] LustreError: 137-5: fir-OST006b_UUID: not available for connect from 10.50.3.13@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1925942.157940] LustreError: Skipped 8 previous similar messages [1925945.290880] Lustre: fir-OST006c: Client c027f381-1762-4c93-a2b8-9ae95d9cb2c5 (at 10.50.3.28@o2ib2) reconnecting [1925945.301322] Lustre: Skipped 792 previous similar messages [1925955.344238] LustreError: 137-5: fir-OST0069_UUID: not available for connect from 10.51.7.68@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1925959.298972] LustreError: 8191:0:(sec.c:2543:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(18667) req@ffff9d8b80036780 x1770623142585152/t0(0) o4->443a3bab-3fb2-4a3a-a561-2ccda54a08b9@10.50.8.43@o2ib2:346/0 lens 488/448 e 0 to 0 dl 1690253576 ref 1 fl Interpret:/0/0 rc 0/0 job:'25058957' [1925959.298985] Lustre: fir-OST006e: Bulk IO write error with 443a3bab-3fb2-4a3a-a561-2ccda54a08b9 (at 10.50.8.43@o2ib2), client will retry: rc = -110 [1925959.299047] LustreError: 47299:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9dadd3c74800 x1770596670374464/t0(0) o3->50f0fb76-9bd6-4818-abdf-4e50cfcd1d66@10.51.12.1@o2ib3:349/0 lens 488/440 e 0 to 0 dl 1690253579 ref 1 fl Interpret:/0/0 rc 0/0 job:'25034867' [1925959.299102] Lustre: fir-OST006c: Bulk IO read error with 50f0fb76-9bd6-4818-abdf-4e50cfcd1d66 (at 10.51.12.1@o2ib3), client will retry: rc -110 [1925959.377170] LustreError: 8191:0:(sec.c:2543:sptlrpc_svc_unwrap_bulk()) Skipped 1 previous similar message [1925973.081207] LustreError: 137-5: fir-OST006d_UUID: not available for connect from 10.51.8.40@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1925973.098796] LustreError: Skipped 7 previous similar messages [1925983.086905] Lustre: fir-OST006e: Client 420924d5-8bda-4ec0-a5f5-03b0312f5f89 (at 10.50.4.13@o2ib2) reconnecting [1925983.097162] Lustre: Skipped 76 previous similar messages [1925983.295822] LustreError: 57513:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9d6d68376050 x1770423457352640/t0(0) o3->faedde90-7e82-4946-874c-8e6733045ece@10.50.5.42@o2ib2:375/0 lens 488/440 e 1 to 0 dl 1690253605 ref 1 fl Interpret:/0/0 rc 0/0 job:'25061372' [1925983.295889] Lustre: fir-OST006c: Bulk IO read error with 50f0fb76-9bd6-4818-abdf-4e50cfcd1d66 (at 10.51.12.1@o2ib3), client will retry: rc -110 [1925983.295890] Lustre: Skipped 3 previous similar messages [1925983.340178] LustreError: 57513:0:(ldlm_lib.c:3550:target_bulk_io()) Skipped 9 previous similar messages [1925996.698024] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.224@o2ib7: 3 seconds [1926022.694598] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.224@o2ib7: 11 seconds [1926035.032990] Lustre: 31124:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690253503/real 1690253503] req@ffff9d56348e6780 x1770377359340416/t0(0) o106->fir-OST006c@10.51.1.36@o2ib3:15/16 lens 328/280 e 0 to 1 dl 1690253631 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1926035.061361] Lustre: 31124:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 1 previous similar message [1926037.483674] Lustre: 46651:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690253506/real 1690253506] req@ffff9d5622483f00 x1770377359346432/t0(0) o104->fir-OST006a@10.51.9.1@o2ib3:15/16 lens 328/224 e 0 to 1 dl 1690253634 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1926037.552118] LNetError: 1533:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.224@o2ib7) recovery failed with -113 [1926037.564038] LNetError: 1533:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 27 previous similar messages [1926058.286940] Lustre: fir-OST006a: Client b2fb6051-c09d-4215-8543-69821d22238f (at 10.50.10.63@o2ib2) reconnecting [1926058.297404] Lustre: Skipped 23 previous similar messages [1926064.609101] LNetError: 26717:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib2 through 10.0.10.227@o2ib7 has gone from down to up [1926076.212346] LustreError: 132-0: fir-OST006a: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.51.16.2@o2ib3 inode [0x20005c30f:0x12b43:0x0] object 0x0:12352852 extent [0-49151], client returned csum 0 (type 20), server csum ae3b0d19 (type 20) [1926077.000162] LustreError: 137-5: fir-OST006b_UUID: not available for connect from 10.50.16.11@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [1926077.017704] LustreError: Skipped 19 previous similar messages [1926230.285493] LNetError: 14599:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.224@o2ib7) recovery failed with -113 [1926230.297807] LNetError: 14599:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 9 previous similar messages [1926953.256392] LNetError: 11129:0:(lib-move.c:4005:lnet_handle_recovery_reply()) peer NI (10.0.10.238@o2ib7) recovery failed with -113 [1926953.268386] LNetError: 11129:0:(lib-move.c:4005:lnet_handle_recovery_reply()) Skipped 4 previous similar messages [1927033.561591] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.224@o2ib7: 2 seconds [1927046.559881] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.224@o2ib7: 0 seconds [1927097.553166] LNet: 26687:0:(o2iblnd_cb.c:3418:kiblnd_check_conns()) Timed out tx for 10.0.10.224@o2ib7: 6 seconds [1927123.730262] LNetError: 26718:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.237@o2ib7 has gone from down to up [1927144.137172] LNetError: 26718:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.239@o2ib7 has gone from down to up [1927149.400636] LNetError: 26711:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.238@o2ib7 has gone from down to up [1927222.750610] Lustre: fir-OST006e: Client 65b5810b-6f93-4458-9068-a97f49487296 (at 10.51.1.57@o2ib3) reconnecting [1927222.760870] Lustre: Skipped 14 previous similar messages [1927223.905515] LustreError: 8170:0:(ldlm_lib.c:3544:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff9d560f7bcc80 x1770437662969024/t0(0) o3->ec8b1a46-1fc6-42a8-beee-bce5bff16c20@10.51.17.18@o2ib3:153/0 lens 488/440 e 0 to 0 dl 1690254893 ref 1 fl Interpret:/0/0 rc 0/0 job:'24609904' [1927223.976131] LustreError: 137-5: fir-OST006d_UUID: not available for connect from 10.51.1.47@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [1927230.107796] LustreError: 26716:0:(events.c:476:server_bulk_callback()) event type 5, status -5, desc ffff9d5865c9e400 [1927230.108176] LNetError: 26687:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) route to o2ib3 through 10.0.10.237@o2ib7 has gone from up to down [1927230.108178] LNetError: 26687:0:(lib-lnet.h:1241:lnet_set_route_aliveness()) Skipped 4 previous similar messages [1927230.118587] LNetError: 26718:0:(o2iblnd_cb.c:1066:kiblnd_tx_complete()) Received an event on a freed tx: ffffb494f36fedd8 status -103 [1927230.153867] LustreError: 26716:0:(events.c:476:server_bulk_callback()) Skipped 1 previous similar message [1927255.237393] LustreError: 57518:0:(ldlm_lib.c:3550:target_bulk_io()) @@@ network error on bulk READ req@ffff9d826366a880 x1771641140611520/t0(0) o3->09e7f960-f848-4ceb-b73b-ca8aaba50c92@10.51.14.9@o2ib3:133/0 lens 488/440 e 0 to 0 dl 1690254873 ref 1 fl Interpret:/0/0 rc 0/0 job:'' [1927255.237502] Lustre: fir-OST006a: Bulk IO read error with a6928e12-7da2-4554-9563-2172d2f3d36d (at 10.51.17.10@o2ib3), client will retry: rc -110 [1927255.237502] Lustre: Skipped 6 previous similar messages [1927255.238708] LustreError: 57505:0:(sec.c:2543:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(37324) req@ffff9d560ccab180 x1770420883036800/t0(0) o4->65b5810b-6f93-4458-9068-a97f49487296@10.51.1.57@o2ib3:133/0 lens 488/448 e 0 to 0 dl 1690254873 ref 1 fl Interpret:/0/0 rc 0/0 job:'25058878' [1927255.238719] Lustre: fir-OST0068: Bulk IO write error with 65b5810b-6f93-4458-9068-a97f49487296 (at 10.51.1.57@o2ib3), client will retry: rc = -110 [1927255.238720] Lustre: Skipped 1 previous similar message [1927255.326046] LustreError: 57518:0:(ldlm_lib.c:3550:target_bulk_io()) Skipped 1 previous similar message [1927331.334013] Lustre: fir-OST006e: Client bf043d57-9d32-4837-8281-b06e508ea9a6 (at 10.51.1.51@o2ib3) reconnecting [1927331.344278] Lustre: Skipped 60 previous similar messages [1927331.615321] Lustre: 64020:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690254800/real 1690254800] req@ffff9d5622d07980 x1770377378442240/t0(0) o106->fir-OST006a@10.51.8.35@o2ib3:15/16 lens 328/280 e 0 to 1 dl 1690254928 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1927331.643522] Lustre: 64020:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 1 previous similar message [1927333.964009] Lustre: 47088:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690254802/real 1690254802] req@ffff9d825b27cc80 x1770377378471104/t0(0) o105->fir-OST0068@10.51.7.60@o2ib3:15/16 lens 392/224 e 0 to 1 dl 1690254930 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'' [1927333.992205] Lustre: 47088:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 1 previous similar message [1927361.489487] LNetError: 26687:0:(o2iblnd.c:992:kiblnd_destroy_conn()) ASSERTION( conn->ibc_nsends_posted == 0 ) failed: [1927361.500446] LNetError: 26687:0:(o2iblnd.c:992:kiblnd_destroy_conn()) LBUG [1927361.507427] Pid: 26687, comm: kiblnd_connd 3.10.0-1160.90.1.el7_lustre.pl1.x86_64 #1 SMP Tue Jun 20 15:47:49 PDT 2023 [1927361.518344] Call Trace: [1927361.520991] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [1927361.526319] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [1927361.531308] [<0>] kiblnd_destroy_conn+0x476/0x650 [ko2iblnd] [1927361.537253] [<0>] kiblnd_connd+0xfa/0xcb0 [ko2iblnd] [1927361.542393] [<0>] kthread+0xd1/0xe0 [1927361.546067] [<0>] ret_from_fork_nospec_begin+0x7/0x21 [1927361.551328] [<0>] 0xfffffffffffffffe [1927361.555087] Kernel panic - not syncing: LBUG [1927361.559528] CPU: 59 PID: 26687 Comm: kiblnd_connd Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.90.1.el7_lustre.pl1.x86_64 #1 [1927361.572724] Hardware name: Dell Inc. PowerEdge R6525/0N7YGH, BIOS 2.11.3 02/24/2023 [1927361.580548] Call Trace: [1927361.583167] [] dump_stack+0x19/0x1f [1927361.588478] [] panic+0xe8/0x21f [1927361.593445] [] lbug_with_loc+0x9b/0xa0 [libcfs] [1927361.599796] [] kiblnd_destroy_conn+0x476/0x650 [ko2iblnd] [1927361.607016] [] kiblnd_connd+0xfa/0xcb0 [ko2iblnd] [1927361.613541] [] ? wake_up_atomic_t+0x40/0x40 [1927361.619546] [] ? kiblnd_cm_callback+0x2140/0x2140 [ko2iblnd] [1927361.627023] [] kthread+0xd1/0xe0 [1927361.632074] [] ? insert_kthread_work+0x40/0x40 [1927361.638774] [] ret_from_fork_nospec_begin+0x7/0x21 [1927361.645387] [] ? insert_kthread_work+0x40/0x40