Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
The following assertion was triggered on one of our clusters:
socklnd_cb.c:1950:ksocknal_connect()) ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed:
socklnd_cb.c:1950:ksocknal_connect()) LBUG
From crash dumps, we can see that the conn_cb has been set with:
struct ksock_conn_cb {
...
ksnr_scheduled = 1,
ksnr_connecting = 1,
ksnr_connected = 10,
ksnr_deleted = 0,
ksnr_ctrl_conn_count = 1,
ksnr_blki_conn_count = 1,
ksnr_blko_conn_count = 0,
ksnr_conn_count = 2,
ksnr_max_conns = 8,
ksnr_busy_retry_count = 3
}
The debug log shows that a connection race between the two peers is being hit three times, which causes the ksnr_busy_retry_count = 3 in the conn_cb.
hornc has suggested a fix for this, which we will be submitting in a bit.