[LU-17258] socklnd connection type not established upon connection race Created: 02/Nov/23 Updated: 07/Feb/24 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Nikitas Angelinas | Assignee: | Serguei Smirnov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
The following assertion was triggered on one of our clusters:
From crash dumps, we can see that the conn_cb has been set with:
The debug log shows that a connection race between the two peers is being hit three times, which causes the ksnr_busy_retry_count = 3 in the conn_cb. hornc has suggested a fix for this, which we will be submitting in a bit. |
| Comments |
| Comment by Gerrit Updater [ 02/Nov/23 ] |
|
"Nikitas Angelinas <nikitas.angelinas@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52957 |
| Comment by Gerrit Updater [ 08/Nov/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52957/ |
| Comment by Peter Jones [ 09/Nov/23 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 07/Feb/24 ] |
|
"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53955 |