Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12279

client got evicted due to network issue.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0, Lustre 2.12.2
    • Lustre 2.13.0, Lustre 2.12.2
    • version=2.12.53_13_g4191e0c
    • 3
    • 9223372036854775807

    Description

      soak has been running on master branch version 2.12.53_13_g4191e0c for about 2 days, no crash, but many applications failed 511 fail /956 pass. From the syslog, seems caused by network issue. The first 24 hours seems good, failure rate is similar to 2.12.1, but as the test went by, applications started to fail a lot.

      Some error msg seems similar as LU-12065 which has already been fixed in this version.

      [root@soak-16 syslog]# grep -r "Async QP"
      soak-20.log:May  8 06:43:29 soak-20 kernel: LNetError: 0:0:(o2iblnd_cb.c:3665:kiblnd_qp_event()) 192.168.1.105@o2ib: Async QP event type 1
      soak-35.log:May  8 06:42:10 soak-35 kernel: LNetError: 0:0:(o2iblnd_cb.c:3665:kiblnd_qp_event()) 192.168.1.105@o2ib: Async QP event type 1
      soak-36.log:May  8 06:41:43 soak-36 kernel: LNetError: 0:0:(o2iblnd_cb.c:3665:kiblnd_qp_event()) 192.168.1.105@o2ib: Async QP event type 1
      soak-17.log:May  8 06:42:27 soak-17 kernel: LNetError: 0:0:(o2iblnd_cb.c:3665:kiblnd_qp_event()) 192.168.1.105@o2ib: Async QP event type 1
      soak-38.log:May  8 06:42:09 soak-38 kernel: LNetError: 0:0:(o2iblnd_cb.c:3665:kiblnd_qp_event()) 192.168.1.105@o2ib: Async QP event type 1
      soak-40.log:May  8 06:41:49 soak-40 kernel: LNetError: 0:0:(o2iblnd_cb.c:3665:kiblnd_qp_event()) 192.168.1.105@o2ib: Async QP event type 1
      [root@soak-16 syslog]# 
      

      many of following errors showed in client syslog

      May  8 07:24:51 soak-17 kernel: LustreError: 218649:0:(import.c:343:ptlrpc_invalidate_import()) soaked-OST0009_UUID: rc = -110 waiting for callback (6 != 0)
      May  8 07:24:51 soak-17 kernel: LustreError: 218649:0:(import.c:369:ptlrpc_invalidate_import()) @@@ still on sending list  req@ffff94340487a400 x1632824181437936/t0(0) o4->soaked-OST0009-osc-ffff943a9be9a800@192.168.1.105@o2ib:6/4 lens 488/448 e 0 to 0 dl 1557297784 ref 2 fl UnregBULK:ES/0/ffffffff rc -5/-1
      May  8 07:24:51 soak-17 kernel: LustreError: 218649:0:(import.c:383:ptlrpc_invalidate_import()) soaked-OST0009_UUID: Unregistering RPCs found (6). Network is sluggish? Waiting them to error out.
      

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: