Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12901

Failing to create a properly sized IB queue pair

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.13.0
    • None
    • Seen with newer Mellanox ConnectX-4 devices
    • 3
    • 9223372036854775807

    Description

      Attempting to bring up a file system in our test bed with the latest lustre version (2.13) I saw this new error on LNet bring up.

      [ 472.738363] LNet: 8481:0:(o2iblnd_cb.c:3395:kiblnd_check_conns()) Timed out tx for 10.37.248.232@o2ib1: 471 seconds
      [ 473.739295] LNetError: 2014:0:(o2iblnd.c:929:kiblnd_create_conn()) Can't create QP: -12, send_wr: 16317, recv_wr: 128, send_sge: 2, recv_sge: 1

      I found I can lower the peer_credits to get around this but that is not the proper fix.

       

       

       

       

      Attachments

        Issue Links

          Activity

            [LU-12901] Failing to create a properly sized IB queue pair

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45901
            Subject: LU-12901 o2iblnd: retry qp creation with reduced queue depth
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 9e0736f2306286f2f2c653c4e06c17d2201d1c0f

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45901 Subject: LU-12901 o2iblnd: retry qp creation with reduced queue depth Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 9e0736f2306286f2f2c653c4e06c17d2201d1c0f
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40748/
            Subject: LU-12901 o2iblnd: retry qp creation with reduced queue depth
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8a3ef5713cc4aed1ac7bd3ce177895caa597cc4c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40748/ Subject: LU-12901 o2iblnd: retry qp creation with reduced queue depth Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8a3ef5713cc4aed1ac7bd3ce177895caa597cc4c

            Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40748
            Subject: LU-12901 o2iblnd: retry qp creation with reduced queue depth
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fe4fcd922196355b08981d9015f1635c88904fd3

            gerrit Gerrit Updater added a comment - Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40748 Subject: LU-12901 o2iblnd: retry qp creation with reduced queue depth Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fe4fcd922196355b08981d9015f1635c88904fd3
            ssmirnov Serguei Smirnov made changes -
            Link New: This issue is related to LU-10213 [ LU-10213 ]
            ssmirnov Serguei Smirnov made changes -
            Link New: This issue is related to LU-7124 [ LU-7124 ]
            knweiss Karsten Weiss added a comment - - edited

            May I suggest to change the "Affects Version/s" attribute of this bug from 2.13.0 to 2.12.x (including 2.12.5 which is a LTS release). See e.g. the comments here or the reports on lustre-discuss.

            knweiss Karsten Weiss added a comment - - edited May I suggest to change the " Affects Version/s " attribute of this bug from 2.13.0 to 2.12.x (including 2.12.5 which is a LTS release). See e.g. the comments here or the reports on lustre-discuss.
            aeonjeff Jeff Johnson added a comment -

            I see this issue as well. CentOS 7.8, Lustre 2.13.0, MOFED 5.0-2.1.8, ConnectX6. Setting peer_credits to 128 fails as described. Lowering peer_credits to 48 results in functioning lnet.

             

             

            aeonjeff Jeff Johnson added a comment - I see this issue as well. CentOS 7.8, Lustre 2.13.0, MOFED 5.0-2.1.8, ConnectX6. Setting peer_credits to 128 fails as described. Lowering peer_credits to 48 results in functioning lnet.    
            mneff Michael Neff added a comment -

            I also see this on a Lustre client with Centos7.7 and ConnectX6 using Mellanox OFED 4.7

            mneff Michael Neff added a comment - I also see this on a Lustre client with Centos7.7 and ConnectX6 using Mellanox OFED 4.7

            People

              ssmirnov Serguei Smirnov
              simmonsja James A Simmons
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: