Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.13.0
-
None
-
Seen with newer Mellanox ConnectX-4 devices
-
3
-
9223372036854775807
Description
Attempting to bring up a file system in our test bed with the latest lustre version (2.13) I saw this new error on LNet bring up.
[ 472.738363] LNet: 8481:0:(o2iblnd_cb.c:3395:kiblnd_check_conns()) Timed out tx for 10.37.248.232@o2ib1: 471 seconds
[ 473.739295] LNetError: 2014:0:(o2iblnd.c:929:kiblnd_create_conn()) Can't create QP: -12, send_wr: 16317, recv_wr: 128, send_sge: 2, recv_sge: 1
I found I can lower the peer_credits to get around this but that is not the proper fix.
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45901
Subject:
LU-12901o2iblnd: retry qp creation with reduced queue depthProject: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 9e0736f2306286f2f2c653c4e06c17d2201d1c0f