Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8485

workqueue overflows with mlx5 on power8 platforms.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.8.0, Lustre 2.9.0
    • None
    • Power8 client nodes running RHEL7.2 with Mellanox OFED 3.2-1.04
    • 3
    • 9223372036854775807

    Description

      Currently in my testing on the Power8 platform I from time to time see the following errors on the clients and the lustre becomes unusable.

      [ 3499.198051] mlx5_warn:mlx5_0:begin_wqe:4013:(pid 7712): work queue overflow
      [ 3499.198176] mlx5_warn:mlx5_0:mlx5_ib_post_send:4112:(pid 7712): Failed to prepare WQE
      [ 3499.198209] mlx5_warn:mlx5_0:begin_wqe:4013:(pid 7715): work queue overflow
      [ 3499.198240] LustreError: 7712:0:(events.c:203:client_bulk_callback()) event type 1, status -12, desc c000001772778c00
      [ 3499.198428] mlx5_warn:mlx5_0:mlx5_ib_post_send:4112:(pid 7715): Failed to prepare WQE
      [ 3499.198527] LustreError: 7715:0:(events.c:203:client_bulk_callback()) event type 1, status -12, desc c000000788600c00
      [ 3499.199804] LustreError: 7713:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c000000e27e06800
      [ 3499.199928] LustreError: 7714:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c000000788602200
      [ 3499.200740] LustreError: 7712:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c00000077cec7400
      [ 3499.201667] LustreError: 7715:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c00000039da2f400
      [ 3499.202216] LustreError: 7715:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c000000780129c00
      [ 3499.202422] LustreError: 7713:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c000000e270c3000
      [ 3499.202642] LustreError: 7715:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c000001b98441800
      [ 3499.202864] LustreError: 7712:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c000000c6d9fd600
      [ 3499.203091] LustreError: 7714:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c000000dd0309200
      [ 3499.203942] LustreError: 7713:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc c000000e27e06200
      [ 3499.558222] LNet: 7659:0:(o2iblnd_cb.c:1360:kiblnd_reconnect_peer()) Abort reconnection of 10.37.248.77@o2ib1: connected
      [ 3499.558317] LNet: 7659:0:(o2iblnd_cb.c:1360:kiblnd_reconnect_peer()) Skipped 4 previous similar messages

      Attachments

        Issue Links

          Activity

            People

              doug Doug Oucharek (Inactive)
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: