[LU-8782] queue depth too large: 63 (<=16 wanted) Created: 31/Oct/16  Updated: 30/Jun/17  Resolved: 30/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Amir Shehata (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: File debug.out    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We would like to get clarification and verify configuration using patch LU-3322.

We are running 2.7.2 with patch from LU-3322. When configureing two hosts with different peer_credits we get this error when the initial connection is made.

LNetError: 81:0:(o2iblnd_cb.c:2325:kiblnd_passive_connect()) Can't accept conn from xxx.xxx.xxx.xxx@o2ib, queue depth too large:  63 (<=16 wanted)

Although we get the error lnet_selftest is working.

Host1:

options ko2iblnd timeout=150 retry_count=7 peer_timeout=0 peer_credits=63 concurrent_sends=63

Host2

options ko2iblnd timeout=150 retry_count=7 peer_credits=16 concurrent_sends=63

Attaching debug output from HOST2



 Comments   
Comment by Joseph Gmitter (Inactive) [ 01/Nov/16 ]

Hi Doug,

Any advice for this issue?

Thanks.
Joe

Comment by Doug Oucharek (Inactive) [ 02/Nov/16 ]

Looking at the code for LU-3322, when a connection attempt is rejected due to a queue depth being too big, a new connection attempt should be done at the lower queue depth value.

So, when you see this error message, do the two nodes still end up with a connection and everything works fine? If so, then we should just "turn-down" this error message to just be informational.

Comment by Mahmoud Hanafi [ 29/Jun/17 ]

Please close this case. This error is well understood.

Comment by Minh Diep [ 30/Jun/17 ]

Thanks

Generated at Sat Feb 10 02:20:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.