Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13370

some clients Intermittent lctl ping oss failure

Details

    • Bug
    • Resolution: Unresolved
    • Blocker
    • None
    • None
    • None
    • OS: CentOS Linux release 7.6.1810
      Kernel: 3.10.0-957.10.1.el7_lustre.x86_64
      OFED : MLNX_OFED_LINUX-4.6-1.0.1.1
      lustre version: 2.12.2(clients&server)
    • 3
    • 9223372036854775807

    Description

      we have two OSS(oss1&oss2),recently some clients lctl ping oss2 failed for a while then successed, then failed ,then successed......

      [root@d2704 ~]# lctl ping 10.10.2.22@o2ib
      12345-0@lo
      12345-10.10.2.22@o2ib
      [root@d2704 ~]# lctl ping 10.10.2.22@o2ib
      failed to ping 10.10.2.22@o2ib: Input/output error
      [root@d2704 ~]# lctl ping 10.10.2.22@o2ib
      failed to ping 10.10.2.22@o2ib: Input/output error
      [root@d2704 ~]# lctl ping 10.10.2.22@o2ib
      failed to ping 10.10.2.22@o2ib: Input/output error
      [root@d2704 ~]# lctl ping 10.10.2.22@o2ib
      failed to ping 10.10.2.22@o2ib: Input/output error
      [root@d2704 ~]# lctl ping 10.10.2.22@o2ib
      failed to ping 10.10.2.22@o2ib: Input/output error
      [root@d2704 ~]# lctl ping 10.10.2.22@o2ib
      failed to ping 10.10.2.22@o2ib: Input/output error
      

      but lctl ping oss1 always works fine

      [root@d2704 ~]# lctl ping 10.10.2.21@o2ib
      12345-0@lo
      12345-10.10.2.21@o2ib
      [root@d2704 ~]# lctl ping 10.10.2.21@o2ib
      12345-0@lo
      12345-10.10.2.21@o2ib
      [root@d2704 ~]# lctl ping 10.10.2.21@o2ib
      12345-0@lo
      12345-10.10.2.21@o2ib
      [root@d2704 ~]# lctl ping 10.10.2.21@o2ib
      12345-0@lo
      12345-10.10.2.21@o2ib
      [root@d2704 ~]# lctl ping 10.10.2.21@o2ib
      12345-0@lo
      12345-10.10.2.21@o2ib
      [root@d2704 ~]# lctl ping 10.10.2.21@o2ib
      12345-0@lo
      12345-10.10.2.21@o2ib
      [root@d2704 ~]# lctl ping 10.10.2.21@o2ib
      12345-0@lo
      12345-10.10.2.21@o2ib
      [root@d2704 ~]# lctl ping 10.10.2.21@o2ib
      12345-0@lo
      12345-10.10.2.21@o2ib
      

      from client's dmesg,I can see

      [Wed Mar 18 22:28:15 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:28:21 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:28:27 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:28:33 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:28:39 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:28:45 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:28:51 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:28:57 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:03 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:09 2020] LNetError: 25711:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds
      [Wed Mar 18 22:29:09 2020] LNetError: 25711:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 63 previous similar messages
      [Wed Mar 18 22:29:09 2020] LNetError: 25711:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.10.2.22@o2ib (0): c: 2, oc: 0, rc: 8
      [Wed Mar 18 22:29:09 2020] LNetError: 25711:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 63 previous similar messages
      [Wed Mar 18 22:29:09 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:15 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:21 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:27 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:33 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:39 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:45 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:51 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:29:57 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:30:03 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:30:09 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:30:10 2020] LNet: 25711:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.10.2.22@o2ib: 6 seconds
      [Wed Mar 18 22:31:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:31:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:31:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:31:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:31:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:31:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:31:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:31:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:31:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:32:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:36 2020] Lustre: 25727:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1584542013/real 1584542013]  req@ffff8d3396b56780 x1661492078327264/t0(0) o3->public1-OST0003-osc-ffff8d3ae8ab1800@10.10.2.22@o2ib:6/4 lens 488/440 e 0 to 1 dl 1584542100 ref 2 fl Rpc:eX/2/ffffffff rc 0/-1
      [Wed Mar 18 22:33:36 2020] Lustre: 25727:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 84 previous similar messages
      [Wed Mar 18 22:33:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:33:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:00 2020] Lustre: public1-OST0003-osc-ffff8d3ae8ab1800: Connection to public1-OST0003 (at 10.10.2.22@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      [Wed Mar 18 22:34:00 2020] Lustre: Skipped 84 previous similar messages
      [Wed Mar 18 22:34:00 2020] Lustre: public1-OST0003-osc-ffff8d3ae8ab1800: Connection restored to 10.10.2.22@o2ib (at 10.10.2.22@o2ib)
      [Wed Mar 18 22:34:00 2020] Lustre: Skipped 84 previous similar messages
      [Wed Mar 18 22:34:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:34:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:24 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:30 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:36 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:42 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:48 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:35:54 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:36:00 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:36:06 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:36:12 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:36:18 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      

      and from oss2's dmesg, I can see

      [Wed Mar 18 22:32:19 2020] Lustre: Skipped 91 previous similar messages
      [Wed Mar 18 22:32:25 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979eea197e00
      [Wed Mar 18 22:32:31 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9793a0362000
      [Wed Mar 18 22:32:37 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979eea190200
      [Wed Mar 18 22:32:43 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97900a61d600
      [Wed Mar 18 22:32:49 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97900a61f000
      [Wed Mar 18 22:32:55 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979053a59600
      [Wed Mar 18 22:33:01 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979c94ac2a00
      [Wed Mar 18 22:33:07 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979053242400
      [Wed Mar 18 22:33:13 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979045c19e00
      [Wed Mar 18 22:33:19 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979045c1e600
      [Wed Mar 18 22:33:25 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9799b8a7b200
      [Wed Mar 18 22:33:31 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9798c256ec00
      [Wed Mar 18 22:33:37 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9799f3171c00
      [Wed Mar 18 22:33:43 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979df0abaa00
      [Wed Mar 18 22:33:49 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979702b22c00
      [Wed Mar 18 22:33:55 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796bad34a00
      [Wed Mar 18 22:34:53 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9790551f9800
      [Wed Mar 18 22:34:59 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b984fa800
      [Wed Mar 18 22:35:05 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796e069f200
      [Wed Mar 18 22:35:11 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979047b68800
      [Wed Mar 18 22:35:17 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979dc5cf2c00
      [Wed Mar 18 22:35:23 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979048e77a00
      [Wed Mar 18 22:35:29 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979dc5cf1800
      [Wed Mar 18 22:35:35 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97923f8be800
      [Wed Mar 18 22:35:41 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9794df7b5a00
      [Wed Mar 18 22:35:47 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979047b69200
      [Wed Mar 18 22:35:47 2020] Lustre: public1-OST0003: Connection restored to  (at 10.10.4.4@o2ib)
      [Wed Mar 18 22:35:47 2020] Lustre: Skipped 280 previous similar messages
      [Wed Mar 18 22:35:53 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97a016724800
      [Wed Mar 18 22:35:59 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979dc6b84600
      [Wed Mar 18 22:36:05 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796bad37000
      [Wed Mar 18 22:36:11 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979060264200
      [Wed Mar 18 22:36:17 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9795ae4bcc00
      [Wed Mar 18 22:36:23 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9797ce935a00
      [Wed Mar 18 22:36:29 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97987de45800
      [Wed Mar 18 22:36:35 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979060261400
      [Wed Mar 18 22:36:41 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979802423c00
      [Wed Mar 18 22:36:47 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979df158fa00
      [Wed Mar 18 22:37:44 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979f69ba0000
      [Wed Mar 18 22:37:46 2020] LustreError: 16563:0:(tgt_grant.c:750:tgt_grant_check()) public1-OST0003: cli a38e0dfe-5ad8-665c-63d5-4314b98afc7e claims 4218880 GRANT, real grant 0
      [Wed Mar 18 22:37:46 2020] LustreError: 16563:0:(tgt_grant.c:750:tgt_grant_check()) Skipped 8795 previous similar messages
      [Wed Mar 18 22:37:50 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9794d5712400
      [Wed Mar 18 22:37:56 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9794582cec00
      [Wed Mar 18 22:38:02 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9798a5855600
      [Wed Mar 18 22:38:08 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9790601e3e00
      [Wed Mar 18 22:38:14 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979ef589b600
      [Wed Mar 18 22:38:20 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9798a5857c00
      [Wed Mar 18 22:38:26 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b121eb200
      [Wed Mar 18 22:38:32 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9791a45d4a00
      [Wed Mar 18 22:38:38 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979d4d518600
      [Wed Mar 18 22:38:44 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979418254600
      [Wed Mar 18 22:38:50 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97995b8de200
      [Wed Mar 18 22:38:56 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b3a982400
      [Wed Mar 18 22:39:02 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979e24faf400
      [Wed Mar 18 22:39:08 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979e24faca00
      [Wed Mar 18 22:39:14 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979f825c0e00
      [Wed Mar 18 22:39:20 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979009428e00
      [Wed Mar 18 22:39:26 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97ac3ca01600
      [Wed Mar 18 22:39:32 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9794ca3aaa00
      [Wed Mar 18 22:39:38 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97aff0cf4400
      [Wed Mar 18 22:39:44 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979deb138800
      [Wed Mar 18 22:39:50 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9791ccf73e00
      [Wed Mar 18 22:39:56 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97ae2738d000
      [Wed Mar 18 22:40:02 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9795d504f600
      [Wed Mar 18 22:40:08 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979df33db600
      [Wed Mar 18 22:40:14 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979d61544400
      [Wed Mar 18 22:40:20 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9790176e7400
      [Wed Mar 18 22:40:26 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97902300a800
      [Wed Mar 18 22:40:32 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9790176e0600
      [Wed Mar 18 22:40:38 2020] LNetError: 15077:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds
      [Wed Mar 18 22:40:38 2020] LNetError: 15077:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 20 previous similar messages
      [Wed Mar 18 22:40:38 2020] LNetError: 15077:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.10.4.4@o2ib (6): c: 7, oc: 0, rc: 8
      [Wed Mar 18 22:40:38 2020] LNetError: 15077:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 20 previous similar messages
      [Wed Mar 18 22:40:38 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b3ed53a00
      [Wed Mar 18 22:40:44 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979ee39a4600
      [Wed Mar 18 22:40:50 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b3ed55200
      [Wed Mar 18 22:40:56 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796b8a3da00
      [Wed Mar 18 22:41:02 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979012dafe00
      [Wed Mar 18 22:41:08 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979818a8d200
      [Wed Mar 18 22:41:14 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979f6dfcf000
      [Wed Mar 18 22:41:20 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796b8a38200
      [Wed Mar 18 22:41:26 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9796b8a3fc00
      [Wed Mar 18 22:41:32 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979fb17de000
      [Wed Mar 18 22:41:38 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979313b13c00
      [Wed Mar 18 22:41:44 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979fb17dec00
      [Wed Mar 18 22:41:50 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979a91b50e00
      [Wed Mar 18 22:41:50 2020] Lustre: public1-OST0003: Bulk IO read error with eddc1d5f-0847-ea0c-2573-7f8e65a9b5dc (at 10.10.4.4@o2ib), client will retry: rc -110
      [Wed Mar 18 22:41:50 2020] Lustre: Skipped 82 previous similar messages
      [Wed Mar 18 22:41:56 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979033e3ea00
      [Wed Mar 18 22:42:02 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97901bcbb600
      [Wed Mar 18 22:42:02 2020] LustreError: 169829:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9798a84a8850 x1661492078327264/t0(0) o3->eddc1d5f-0847-ea0c-2573-7f8e65a9b5dc@10.10.4.4@o2ib:302/0 lens 488/440 e 0 to 0 dl 1584542207 ref 1 fl Interpret:/2/0 rc 0/0
      [Wed Mar 18 22:42:02 2020] LustreError: 169829:0:(ldlm_lib.c:3253:target_bulk_io()) Skipped 82 previous similar messages
      [Wed Mar 18 22:42:08 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97903bc5ee00
      [Wed Mar 18 22:42:14 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97a081dcd200
      [Wed Mar 18 22:42:20 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9798612c3800
      [Wed Mar 18 22:42:20 2020] Lustre: public1-OST0003: Client eddc1d5f-0847-ea0c-2573-7f8e65a9b5dc (at 10.10.4.4@o2ib) reconnecting
      [Wed Mar 18 22:42:20 2020] Lustre: Skipped 84 previous similar messages
      [Wed Mar 18 22:42:26 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97901bcba800
      [Wed Mar 18 22:42:32 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97978d609000
      [Wed Mar 18 22:42:38 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979cef941600
      [Wed Mar 18 22:42:44 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979011ab9600
      [Wed Mar 18 22:42:50 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9792a838fc00
      [Wed Mar 18 22:42:56 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979636763a00
      [Wed Mar 18 22:43:02 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97962a0b6600
      [Wed Mar 18 22:43:08 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979b43437800
      [Wed Mar 18 22:43:14 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979212905600
      [Wed Mar 18 22:43:20 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97935cc74800
      [Wed Mar 18 22:43:26 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979022cfbe00
      [Wed Mar 18 22:43:32 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979043164200
      [Wed Mar 18 22:43:38 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97a081dca000
      [Wed Mar 18 22:43:44 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979425f48400
      [Wed Mar 18 22:43:50 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979f11db4a00
      [Wed Mar 18 22:43:56 2020] LustreError: 15078:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979de4034800
      [Wed Mar 18 22:44:02 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979de4035e00
      [Wed Mar 18 22:44:08 2020] LustreError: 15081:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff979e8c36bc00
      [Wed Mar 18 22:44:14 2020] LustreError: 15080:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97a820c0ee00
      [Wed Mar 18 22:44:20 2020] LustreError: 15079:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9795d72e3800
      

      I enabled neterror on client,then I can see 

      [Wed Mar 18 22:53:44 2020] Lustre: 25727:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 100 previous similar messages
      [Wed Mar 18 22:53:50 2020] LNet: 25711:0:(o2iblnd_cb.c:2065:kiblnd_close_conn_locked()) Closing conn to 10.10.2.22@o2ib: error -110(waiting)
      [Wed Mar 18 22:53:50 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      [Wed Mar 18 22:53:56 2020] LNet: 25711:0:(o2iblnd_cb.c:2065:kiblnd_close_conn_locked()) Closing conn to 10.10.2.22@o2ib: error -110(waiting)
      [Wed Mar 18 22:53:56 2020] LustreError: 25711:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8d31d0b23a00
      
      

      I have rebooted the client,but it dosen't work。

      Anymore, IPOIB works fine on all nodes.

      Any suggestion will help

      Thanks.

      Attachments

        Activity

          [LU-13370] some clients Intermittent lctl ping oss failure
          There are no comments yet on this issue.

          People

            wc-triage WC Triage
            xiaozg Xiao Zhenggang
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: