Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
Seems there is a race where two ksock_conn_cb can be created:
Bad conn_cb 00000800:00000010:19.0:1652717291.344361:0:4361:0:(socklnd.c:170:ksocknal_create_peer()) alloc '(peer_ni)': 240 at ffff9215aa86b700 (tot 41715364). 00000800:00000010:19.0:1652717291.344362:0:4361:0:(socklnd.c:119:ksocknal_create_conn_cb()) alloc '(conn_cb)': 200 at ffff9215aa86b600 (tot 41715564). 00000800:00000200:19.1:1652717291.344363:0:4361:0:(socklnd.c:556:ksocknal_add_conn_cb_locked()) pre ffff9215aa86b700 1 00000800:00000200:19.1:1652717291.344364:0:4361:0:(socklnd.c:556:ksocknal_add_conn_cb_locked()) post ffff9215aa86b700 2 00000800:00000200:19.0:1652717291.344365:0:4361:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (2) 00000800:00000200:19.1:1652717291.344366:0:4361:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (2) 00000800:00000200:19.1:1652717291.344367:0:4361:0:(socklnd_cb.c:645:ksocknal_launch_connection_locked()) pre ffff9215aa86b600 1 00000800:00000200:19.1:1652717291.344368:0:4361:0:(socklnd_cb.c:645:ksocknal_launch_connection_locked()) post ffff9215aa86b600 2 Good conn_cb 00000800:00000010:16.0:1652717291.344365:0:4360:0:(socklnd.c:119:ksocknal_create_conn_cb()) alloc '(conn_cb)': 200 at ffff9215aca22600 (tot 41715508). ... 00000800:00000200:16.1:1652717291.344371:0:4360:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (2) 00000800:00000200:16.1:1652717291.344375:0:4360:0:(socklnd.c:556:ksocknal_add_conn_cb_locked()) pre ffff9215aa86b700 2 00000800:00000200:16.1:1652717291.344375:0:4360:0:(socklnd.c:556:ksocknal_add_conn_cb_locked()) post ffff9215aa86b700 3 00000800:00000200:16.0:1652717291.344377:0:4360:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (3) 00000800:00000200:16.1:1652717291.344378:0:4360:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (3) 00000800:00000200:16.1:1652717291.344378:0:4360:0:(socklnd_cb.c:645:ksocknal_launch_connection_locked()) pre ffff9215aca22600 1 00000800:00000200:16.1:1652717291.344379:0:4360:0:(socklnd_cb.c:645:ksocknal_launch_connection_locked()) post ffff9215aca22600 2
The second one overwrites the first in ksocknal_add_peer()->ksocknal_add_conn_cb_locked(). The first one gets stuck and is never freed on shutdown.
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48911/
Subject:
LU-15860socklnd: Duplicate ksock_conn_cbProject: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: ea34ee7b40271ec23b6d9ed916a43971dd73fad5