[LU-15885] o2iblnd: RDMA_CM_EVENT_UNREACHABLE may be received after conn clean-up Created: 24/May/22  Updated: 14/Oct/22  Resolved: 10/Oct/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Serguei Smirnov Assignee: Serguei Smirnov
Resolution: Fixed Votes: 0
Labels: o2iblnd

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There's a scenario when IB port going down triggers the following assertion:

  case RDMA_CM_EVENT_UNREACHABLE:
                conn = cmid->context;
                LASSERT(conn->ibc_state == IBLND_CONN_ACTIVE_CONNECT ||
                        conn->ibc_state == IBLND_CONN_PASSIVE_WAIT);

Because connection is already disconnected due to an earlier "RDMA Timeout".

Since it appears to be possible to get RDMA_CM_EVENT_UNREACHABLE after having decided to close the connection, this code should be changed.



 Comments   
Comment by Gerrit Updater [ 08/Sep/22 ]

"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48492
Subject: LU-15885 o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 365cd8cb80536a1434b62abccc50d89cc563b168

Comment by Gerrit Updater [ 10/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48492/
Subject: LU-15885 o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3925b1669d519e6c038ecce1287c1ced3de623d3

Comment by Peter Jones [ 10/Oct/22 ]

Landed for 2.16

Generated at Sat Feb 10 03:22:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.