Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
The following assertion was triggered on one of our clusters:
socklnd_cb.c:1950:ksocknal_connect()) ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed:
socklnd_cb.c:1950:ksocknal_connect()) LBUG
From crash dumps, we can see that the conn_cb has been set with:
struct ksock_conn_cb {
...
ksnr_scheduled = 1,
ksnr_connecting = 1,
ksnr_connected = 10,
ksnr_deleted = 0,
ksnr_ctrl_conn_count = 1,
ksnr_blki_conn_count = 1,
ksnr_blko_conn_count = 0,
ksnr_conn_count = 2,
ksnr_max_conns = 8,
ksnr_busy_retry_count = 3
}
The debug log shows that a connection race between the two peers is being hit three times, which causes the ksnr_busy_retry_count = 3 in the conn_cb.
hornc has suggested a fix for this, which we will be submitting in a bit.
Attachments
Issue Links
Activity
Link | New: This issue is related to DDN-5509 [ DDN-5509 ] |
Link | New: This issue is related to DDN-4560 [ DDN-4560 ] |
Link | New: This issue is related to DDN-4788 [ DDN-4788 ] |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Reopened [ 4 ] | New: Resolved [ 5 ] |
Link | New: This issue is related to DDN-4734 [ DDN-4734 ] |
Assignee | Original: Nikitas Angelinas [ nangelinas ] | New: Serguei Smirnov [ ssmirnov ] |
Resolution | Original: Fixed [ 1 ] | |
Status | Original: Resolved [ 5 ] | New: Reopened [ 4 ] |
Link | New: This issue is related to DDN-4704 [ DDN-4704 ] |
Fix Version/s | New: Lustre 2.16.0 [ 15190 ] | |
Resolution | New: Fixed [ 1 ] | |
Status | Original: In Progress [ 3 ] | New: Resolved [ 5 ] |