Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8693

ko2iblnd recieving IB_WC_MW_BIND_ERR errors.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.8.0, Lustre 2.9.0
    • None
    • Power8 running RHEL with a MOFED 3.3 stack.
    • 3
    • 9223372036854775807

    Description

      Moving to our production Power8 system running an MOFED stack we are seeing a new IB error in the ko2iblnd that wasn't encountered before.

      [ 170.597561] mlx5_warn:mlx5_0:dump_cqe:257:(pid 8738): dump error cqe
      [ 170.597620] mlx5_warn:mlx5_0:dump_cqe:257:(pid 8714): dump error cqe
      [ 170.597622] 00000000 00000000 00000000 00000000
      [ 170.597623] 00000000 00000000 00000000 00000000
      [ 170.597625] 00000000 00000000 00000000 00000000
      [ 170.597626] 00000000 08007806 25000039 0642b3d2
      [ 170.597651] LNet: 8714:0:(o2iblnd_cb.c:3433:kiblnd_complete()) FastReg failed: 6
      [ 170.597728] LNet: 8713:0:(o2iblnd_cb.c:3444:kiblnd_complete()) RDMA (tx: c000003c6a78c5a8) failed: 5
      [ 170.598355] 00000000 00000000 00000000 00000000
      [ 170.598403] 00000000 00000000 00000000 00000000
      [ 170.599245] powernv-cpufreq: CPU 104 on Chip 1 has Pmax restored to 0
      [ 170.599647] LNet: 8714:0:(o2iblnd_cb.c:990:kiblnd_tx_complete()) Tx -> 10.39.232.11@o2ib6 cookie 0x63e sending 1 waiting 0: failed 5
      [ 170.599651] LNet: 8714:0:(o2iblnd_cb.c:990:kiblnd_tx_complete()) Skipped 2 previous similar messages
      [ 170.599654] LNet: 8713:0:(o2iblnd_cb.c:1934:kiblnd_close_conn_locked()) Closing conn to 10.39.232.11@o2ib6: error -5(waiting)
      [ 170.599669] LustreError: 8714:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc c000003c62cf5c00
      [ 170.599675] Lustre: 8896:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1476124274/real 1476124274] req@c000003c4e340000 x1547828424878916/t0(0) o4->atlastds-OST0035-osc-c000001fc5b75000@10.36.226.69@o2ib:6/4 lens 608/448 e 0 to 1 dl 1476124841 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      [ 170.599681] Lustre: atlastds-OST0035-osc-c000001fc5b75000: Connection to atlastds-OST0035 (at 10.36.226.69@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      [ 170.611219] 00000000 00000000 00000000 00000000
      [ 170.612270] 00000000 08007806 2500003a 06789cd2
      [ 170.613866] LustreError: 8737:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc c000001fb98c0400

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: