Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9943

LU-7124 caused a connection problems under load.

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0
    • Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0
    • 3
    • 9223372036854775807

    Description

      LU-7124 is completely incorrect patch.
      It decrease a WR array into kernel, but o2ib lnd stay assume about own queue depth.
      It caused situation

      Aug  8 21:53:00 lstr1n07 kernel: mlx5_warn:mlx5_0:mlx5_ib_post_send:4184:(pid 26272): 
      Failed to prepare WQE
      Aug  8 21:53:00 lstr1n07 kernel: mlx5_warn:mlx5_0:begin_wqe:4085:(pid 9590): work queue overflow
      

      after several ENOMEM hits.

      Attachments

        Activity

          [LU-9943] LU-7124 caused a connection problems under load.

          Thank you for the fast response!

          jgmitter Joseph Gmitter (Inactive) added a comment - Thank you for the fast response!

          Both patches can be drop as replaced with Amir fixes.

          shadow Alexey Lyashkov added a comment - Both patches can be drop as replaced with Amir fixes.

          Do we need https://review.whamcloud.com/28851 or https://review.whamcloud.com/28850 after the landing of Amir's patch?  If not, we can resolve this ticket for 2.11.0.

          jgmitter Joseph Gmitter (Inactive) added a comment - Do we need  https://review.whamcloud.com/28851  or  https://review.whamcloud.com/28850  after the landing of Amir's patch?  If not, we can resolve this ticket for 2.11.0.

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30311/
          Subject: LU-9943 lnd: correct WR fast reg accounting
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 9a379e3226f1ec3be1a628f11a87e34f5d53abd4

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30311/ Subject: LU-9943 lnd: correct WR fast reg accounting Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9a379e3226f1ec3be1a628f11a87e34f5d53abd4

          Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/30311
          Subject: LU-9943 lnd: correct WR fast reg accounting
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 3b9e1cd8fba97cf1e00631a17855d151d07f50e8

          gerrit Gerrit Updater added a comment - Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/30311 Subject: LU-9943 lnd: correct WR fast reg accounting Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3b9e1cd8fba97cf1e00631a17855d151d07f50e8
          pjones Peter Jones added a comment -

          Amir

          Could you please review these proposed changes?

          Thanks

          Peter

          pjones Peter Jones added a comment - Amir Could you please review these proposed changes? Thanks Peter

          Alexey Lyashkov (alexey.lyashkov@seagate.com) uploaded a new patch: https://review.whamcloud.com/28851
          Subject: LU-9943 lnet: fix WR accounting for the FastReg mode
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 0cdc93abd95bab22ca08b34a9f5ebde868ab8f42

          gerrit Gerrit Updater added a comment - Alexey Lyashkov (alexey.lyashkov@seagate.com) uploaded a new patch: https://review.whamcloud.com/28851 Subject: LU-9943 lnet: fix WR accounting for the FastReg mode Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0cdc93abd95bab22ca08b34a9f5ebde868ab8f42

          Alexey Lyashkov (alexey.lyashkov@seagate.com) uploaded a new patch: https://review.whamcloud.com/28850
          Subject: LU-9943 lnet: fix queue size in ENOMEM case.
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 49fb397102a7c3ac3330e88c0839a9c3323e2471

          gerrit Gerrit Updater added a comment - Alexey Lyashkov (alexey.lyashkov@seagate.com) uploaded a new patch: https://review.whamcloud.com/28850 Subject: LU-9943 lnet: fix queue size in ENOMEM case. Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 49fb397102a7c3ac3330e88c0839a9c3323e2471

          patch will send son.

          shadow Alexey Lyashkov added a comment - patch will send son.

          People

            ashehata Amir Shehata (Inactive)
            shadow Alexey Lyashkov
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: