[LU-9943] LU-7124 caused a connection problems under load. Created: 04/Sep/17  Updated: 28/Mar/18  Resolved: 28/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Critical
Reporter: Alexey Lyashkov Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: lnet

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

LU-7124 is completely incorrect patch.
It decrease a WR array into kernel, but o2ib lnd stay assume about own queue depth.
It caused situation

Aug  8 21:53:00 lstr1n07 kernel: mlx5_warn:mlx5_0:mlx5_ib_post_send:4184:(pid 26272): 
Failed to prepare WQE
Aug  8 21:53:00 lstr1n07 kernel: mlx5_warn:mlx5_0:begin_wqe:4085:(pid 9590): work queue overflow

after several ENOMEM hits.



 Comments   
Comment by Alexey Lyashkov [ 04/Sep/17 ]

patch will send son.

Comment by Gerrit Updater [ 04/Sep/17 ]

Alexey Lyashkov (alexey.lyashkov@seagate.com) uploaded a new patch: https://review.whamcloud.com/28850
Subject: LU-9943 lnet: fix queue size in ENOMEM case.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 49fb397102a7c3ac3330e88c0839a9c3323e2471

Comment by Gerrit Updater [ 04/Sep/17 ]

Alexey Lyashkov (alexey.lyashkov@seagate.com) uploaded a new patch: https://review.whamcloud.com/28851
Subject: LU-9943 lnet: fix WR accounting for the FastReg mode
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0cdc93abd95bab22ca08b34a9f5ebde868ab8f42

Comment by Peter Jones [ 11/Sep/17 ]

Amir

Could you please review these proposed changes?

Thanks

Peter

Comment by Gerrit Updater [ 29/Nov/17 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/30311
Subject: LU-9943 lnd: correct WR fast reg accounting
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3b9e1cd8fba97cf1e00631a17855d151d07f50e8

Comment by Gerrit Updater [ 22/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30311/
Subject: LU-9943 lnd: correct WR fast reg accounting
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9a379e3226f1ec3be1a628f11a87e34f5d53abd4

Comment by Joseph Gmitter (Inactive) [ 28/Mar/18 ]

Do we need https://review.whamcloud.com/28851 or https://review.whamcloud.com/28850 after the landing of Amir's patch?  If not, we can resolve this ticket for 2.11.0.

Comment by Alexey Lyashkov [ 28/Mar/18 ]

Both patches can be drop as replaced with Amir fixes.

Comment by Joseph Gmitter (Inactive) [ 28/Mar/18 ]

Thank you for the fast response!

Generated at Sat Feb 10 02:30:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.