Details
-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
None
-
None
-
OS: CentOS Linux release 7.6.1810
Kernel: 3.10.0-957.10.1.el7_lustre.x86_64
OFED : MLNX_OFED_LINUX-4.6-1.0.1.1
lustre version: 2.12.2(clients&server)
-
3
-
9223372036854775807
Description
we have two OSS(oss1&oss2),recently some clients lctl ping oss2 failed for a while then successed, then failed ,then successed......
[root@d2704 ~]# lctl ping 10.10.2.22@o2ib 12345-0@lo 12345-10.10.2.22@o2ib [root@d2704 ~]# lctl ping 10.10.2.22@o2ib failed to ping 10.10.2.22@o2ib: Input/output error [root@d2704 ~]# lctl ping 10.10.2.22@o2ib failed to ping 10.10.2.22@o2ib: Input/output error [root@d2704 ~]# lctl ping 10.10.2.22@o2ib failed to ping 10.10.2.22@o2ib: Input/output error [root@d2704 ~]# lctl ping 10.10.2.22@o2ib failed to ping 10.10.2.22@o2ib: Input/output error [root@d2704 ~]# lctl ping 10.10.2.22@o2ib failed to ping 10.10.2.22@o2ib: Input/output error [root@d2704 ~]# lctl ping 10.10.2.22@o2ib failed to ping 10.10.2.22@o2ib: Input/output error
but lctl ping oss1 always works fine
[root@d2704 ~]# lctl ping 10.10.2.21@o2ib 12345-0@lo 12345-10.10.2.21@o2ib [root@d2704 ~]# lctl ping 10.10.2.21@o2ib 12345-0@lo 12345-10.10.2.21@o2ib [root@d2704 ~]# lctl ping 10.10.2.21@o2ib 12345-0@lo 12345-10.10.2.21@o2ib [root@d2704 ~]# lctl ping 10.10.2.21@o2ib 12345-0@lo 12345-10.10.2.21@o2ib [root@d2704 ~]# lctl ping 10.10.2.21@o2ib 12345-0@lo 12345-10.10.2.21@o2ib [root@d2704 ~]# lctl ping 10.10.2.21@o2ib 12345-0@lo 12345-10.10.2.21@o2ib [root@d2704 ~]# lctl ping 10.10.2.21@o2ib 12345-0@lo 12345-10.10.2.21@o2ib [root@d2704 ~]# lctl ping 10.10.2.21@o2ib 12345-0@lo 12345-10.10.2.21@o2ib
from client's dmesg,I can see
[Wed Mar 18 22:28:15 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:28:21 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:28:27 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:28:33 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:28:39 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:28:45 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:28:51 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:28:57 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:03 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:09 2020] LNetError: 25711:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [Wed Mar 18 22:29:09 2020] LNetError: 25711:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 63 previous similar messages [Wed Mar 18 22:29:09 2020] LNetError: 25711:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.10.2.22@o2ib (0): c: 2, oc: 0, rc: 8 [Wed Mar 18 22:29:09 2020] LNetError: 25711:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 63 previous similar messages [Wed Mar 18 22:29:09 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:15 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:21 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:27 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:33 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:39 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:45 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:51 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:29:57 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:30:03 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:30:09 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:30:10 2020] LNet: 25711:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.10.2.22@o2ib: 6 seconds [Wed Mar 18 22:31:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:31:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:31:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:31:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:31:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:31:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:31:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:31:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:31:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:32:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:36 2020] Lustre: 25727:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1584542013/real 1584542013] req@ffff8d3396b56780 x1661492078327264/t0(0) o3->public1-OST0003-osc-ffff8d3ae8ab1800@10.10.2.22@o2ib:6/4 lens 488/440 e 0 to 1 dl 1584542100 ref 2 fl Rpc:eX/2/ffffffff rc 0/-1 [Wed Mar 18 22:33:36 2020] Lustre: 25727:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 84 previous similar messages [Wed Mar 18 22:33:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:33:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:00 2020] Lustre: public1-OST0003-osc-ffff8d3ae8ab1800: Connection to public1-OST0003 (at 10.10.2.22@o2ib) was lost; in progress operations using this service will wait for recovery to complete [Wed Mar 18 22:34:00 2020] Lustre: Skipped 84 previous similar messages [Wed Mar 18 22:34:00 2020] Lustre: public1-OST0003-osc-ffff8d3ae8ab1800: Connection restored to 10.10.2.22@o2ib (at 10.10.2.22@o2ib) [Wed Mar 18 22:34:00 2020] Lustre: Skipped 84 previous similar messages [Wed Mar 18 22:34:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:34:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:35:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:36:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:36:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:36:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:36:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
and from oss2's dmesg, I can see
[Wed Mar 18 22:32:19 2020] Lustre: Skipped 91 previous similar messages [Wed Mar 18 22:32:25 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979eea197e00 [Wed Mar 18 22:32:31 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9793a0362000 [Wed Mar 18 22:32:37 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979eea190200 [Wed Mar 18 22:32:43 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97900a61d600 [Wed Mar 18 22:32:49 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97900a61f000 [Wed Mar 18 22:32:55 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979053a59600 [Wed Mar 18 22:33:01 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979c94ac2a00 [Wed Mar 18 22:33:07 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979053242400 [Wed Mar 18 22:33:13 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979045c19e00 [Wed Mar 18 22:33:19 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979045c1e600 [Wed Mar 18 22:33:25 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9799b8a7b200 [Wed Mar 18 22:33:31 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9798c256ec00 [Wed Mar 18 22:33:37 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9799f3171c00 [Wed Mar 18 22:33:43 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979df0abaa00 [Wed Mar 18 22:33:49 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979702b22c00 [Wed Mar 18 22:33:55 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796bad34a00 [Wed Mar 18 22:34:53 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9790551f9800 [Wed Mar 18 22:34:59 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b984fa800 [Wed Mar 18 22:35:05 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796e069f200 [Wed Mar 18 22:35:11 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979047b68800 [Wed Mar 18 22:35:17 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979dc5cf2c00 [Wed Mar 18 22:35:23 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979048e77a00 [Wed Mar 18 22:35:29 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979dc5cf1800 [Wed Mar 18 22:35:35 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97923f8be800 [Wed Mar 18 22:35:41 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9794df7b5a00 [Wed Mar 18 22:35:47 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979047b69200 [Wed Mar 18 22:35:47 2020] Lustre: public1-OST0003: Connection restored to (at 10.10.4.4@o2ib) [Wed Mar 18 22:35:47 2020] Lustre: Skipped 280 previous similar messages [Wed Mar 18 22:35:53 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97a016724800 [Wed Mar 18 22:35:59 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979dc6b84600 [Wed Mar 18 22:36:05 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796bad37000 [Wed Mar 18 22:36:11 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979060264200 [Wed Mar 18 22:36:17 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9795ae4bcc00 [Wed Mar 18 22:36:23 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9797ce935a00 [Wed Mar 18 22:36:29 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97987de45800 [Wed Mar 18 22:36:35 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979060261400 [Wed Mar 18 22:36:41 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979802423c00 [Wed Mar 18 22:36:47 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979df158fa00 [Wed Mar 18 22:37:44 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979f69ba0000 [Wed Mar 18 22:37:46 2020] LustreError: 16563:0:(tgt_grant.c:750:tgt_grant_check()) public1-OST0003: cli a38e0dfe-5ad8-665c-63d5-4314b98afc7e claims 4218880 GRANT, real grant 0 [Wed Mar 18 22:37:46 2020] LustreError: 16563:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8795 previous similar messages [Wed Mar 18 22:37:50 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9794d5712400 [Wed Mar 18 22:37:56 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9794582cec00 [Wed Mar 18 22:38:02 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9798a5855600 [Wed Mar 18 22:38:08 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9790601e3e00 [Wed Mar 18 22:38:14 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979ef589b600 [Wed Mar 18 22:38:20 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9798a5857c00 [Wed Mar 18 22:38:26 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b121eb200 [Wed Mar 18 22:38:32 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9791a45d4a00 [Wed Mar 18 22:38:38 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979d4d518600 [Wed Mar 18 22:38:44 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979418254600 [Wed Mar 18 22:38:50 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97995b8de200 [Wed Mar 18 22:38:56 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b3a982400 [Wed Mar 18 22:39:02 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979e24faf400 [Wed Mar 18 22:39:08 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979e24faca00 [Wed Mar 18 22:39:14 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979f825c0e00 [Wed Mar 18 22:39:20 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979009428e00 [Wed Mar 18 22:39:26 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97ac3ca01600 [Wed Mar 18 22:39:32 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9794ca3aaa00 [Wed Mar 18 22:39:38 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97aff0cf4400 [Wed Mar 18 22:39:44 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979deb138800 [Wed Mar 18 22:39:50 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9791ccf73e00 [Wed Mar 18 22:39:56 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97ae2738d000 [Wed Mar 18 22:40:02 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9795d504f600 [Wed Mar 18 22:40:08 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979df33db600 [Wed Mar 18 22:40:14 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979d61544400 [Wed Mar 18 22:40:20 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9790176e7400 [Wed Mar 18 22:40:26 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97902300a800 [Wed Mar 18 22:40:32 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9790176e0600 [Wed Mar 18 22:40:38 2020] LNetError: 15077:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [Wed Mar 18 22:40:38 2020] LNetError: 15077:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 20 previous similar messages [Wed Mar 18 22:40:38 2020] LNetError: 15077:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.10.4.4@o2ib (6): c: 7, oc: 0, rc: 8 [Wed Mar 18 22:40:38 2020] LNetError: 15077:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 20 previous similar messages [Wed Mar 18 22:40:38 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b3ed53a00 [Wed Mar 18 22:40:44 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979ee39a4600 [Wed Mar 18 22:40:50 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b3ed55200 [Wed Mar 18 22:40:56 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796b8a3da00 [Wed Mar 18 22:41:02 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979012dafe00 [Wed Mar 18 22:41:08 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979818a8d200 [Wed Mar 18 22:41:14 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979f6dfcf000 [Wed Mar 18 22:41:20 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796b8a38200 [Wed Mar 18 22:41:26 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796b8a3fc00 [Wed Mar 18 22:41:32 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979fb17de000 [Wed Mar 18 22:41:38 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979313b13c00 [Wed Mar 18 22:41:44 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979fb17dec00 [Wed Mar 18 22:41:50 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979a91b50e00 [Wed Mar 18 22:41:50 2020] Lustre: public1-OST0003: Bulk IO read error with eddc1d5f-0847-ea0c-2573-7f8e65a9b5dc (at 10.10.4.4@o2ib), client will retry: rc -110 [Wed Mar 18 22:41:50 2020] Lustre: Skipped 82 previous similar messages [Wed Mar 18 22:41:56 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979033e3ea00 [Wed Mar 18 22:42:02 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97901bcbb600 [Wed Mar 18 22:42:02 2020] LustreError: 169829:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff9798a84a8850 x1661492078327264/t0(0) o3->eddc1d5f-0847-ea0c-2573-7f8e65a9b5dc@10.10.4.4@o2ib:302/0 lens 488/440 e 0 to 0 dl 1584542207 ref 1 fl Interpret:/2/0 rc 0/0 [Wed Mar 18 22:42:02 2020] LustreError: 169829:0:(ldlm_lib.c:3253:target_bulk_io()) Skipped 82 previous similar messages [Wed Mar 18 22:42:08 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97903bc5ee00 [Wed Mar 18 22:42:14 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97a081dcd200 [Wed Mar 18 22:42:20 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9798612c3800 [Wed Mar 18 22:42:20 2020] Lustre: public1-OST0003: Client eddc1d5f-0847-ea0c-2573-7f8e65a9b5dc (at 10.10.4.4@o2ib) reconnecting [Wed Mar 18 22:42:20 2020] Lustre: Skipped 84 previous similar messages [Wed Mar 18 22:42:26 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97901bcba800 [Wed Mar 18 22:42:32 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97978d609000 [Wed Mar 18 22:42:38 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979cef941600 [Wed Mar 18 22:42:44 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979011ab9600 [Wed Mar 18 22:42:50 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9792a838fc00 [Wed Mar 18 22:42:56 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979636763a00 [Wed Mar 18 22:43:02 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97962a0b6600 [Wed Mar 18 22:43:08 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b43437800 [Wed Mar 18 22:43:14 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979212905600 [Wed Mar 18 22:43:20 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97935cc74800 [Wed Mar 18 22:43:26 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979022cfbe00 [Wed Mar 18 22:43:32 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979043164200 [Wed Mar 18 22:43:38 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97a081dca000 [Wed Mar 18 22:43:44 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979425f48400 [Wed Mar 18 22:43:50 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979f11db4a00 [Wed Mar 18 22:43:56 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979de4034800 [Wed Mar 18 22:44:02 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979de4035e00 [Wed Mar 18 22:44:08 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979e8c36bc00 [Wed Mar 18 22:44:14 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97a820c0ee00 [Wed Mar 18 22:44:20 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9795d72e3800
I enabled neterror on client,then I can see
[Wed Mar 18 22:53:44 2020] Lustre: 25727:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 100 previous similar messages [Wed Mar 18 22:53:50 2020] LNet: 25711:0:(o2iblnd_cb.c:2065:kiblnd_close_conn_locked()) Closing conn to 10.10.2.22@o2ib: error -110(waiting) [Wed Mar 18 22:53:50 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00 [Wed Mar 18 22:53:56 2020] LNet: 25711:0:(o2iblnd_cb.c:2065:kiblnd_close_conn_locked()) Closing conn to 10.10.2.22@o2ib: error -110(waiting) [Wed Mar 18 22:53:56 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
I have rebooted the client,but it dosen't work。
Anymore, IPOIB works fine on all nodes.
Any suggestion will help
Thanks.