Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15860

ksocknal_add_peer() race results in extra ksock_conn_cb

Details

    • 3
    • 9223372036854775807

    Description

      Seems there is a race where two ksock_conn_cb can be created:

      Bad conn_cb
      00000800:00000010:19.0:1652717291.344361:0:4361:0:(socklnd.c:170:ksocknal_create_peer()) alloc '(peer_ni)': 240 at ffff9215aa86b700 (tot 41715364).
      00000800:00000010:19.0:1652717291.344362:0:4361:0:(socklnd.c:119:ksocknal_create_conn_cb()) alloc '(conn_cb)': 200 at ffff9215aa86b600 (tot 41715564).
      00000800:00000200:19.1:1652717291.344363:0:4361:0:(socklnd.c:556:ksocknal_add_conn_cb_locked()) pre ffff9215aa86b700 1
      00000800:00000200:19.1:1652717291.344364:0:4361:0:(socklnd.c:556:ksocknal_add_conn_cb_locked()) post ffff9215aa86b700 2
      00000800:00000200:19.0:1652717291.344365:0:4361:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (2)
      00000800:00000200:19.1:1652717291.344366:0:4361:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (2)
      00000800:00000200:19.1:1652717291.344367:0:4361:0:(socklnd_cb.c:645:ksocknal_launch_connection_locked()) pre ffff9215aa86b600 1
      00000800:00000200:19.1:1652717291.344368:0:4361:0:(socklnd_cb.c:645:ksocknal_launch_connection_locked()) post ffff9215aa86b600 2
      
      Good conn_cb
      00000800:00000010:16.0:1652717291.344365:0:4360:0:(socklnd.c:119:ksocknal_create_conn_cb()) alloc '(conn_cb)': 200 at ffff9215aca22600 (tot 41715508).
      ...
      00000800:00000200:16.1:1652717291.344371:0:4360:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (2)
      00000800:00000200:16.1:1652717291.344375:0:4360:0:(socklnd.c:556:ksocknal_add_conn_cb_locked()) pre ffff9215aa86b700 2
      00000800:00000200:16.1:1652717291.344375:0:4360:0:(socklnd.c:556:ksocknal_add_conn_cb_locked()) post ffff9215aa86b700 3
      00000800:00000200:16.0:1652717291.344377:0:4360:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (3)
      00000800:00000200:16.1:1652717291.344378:0:4360:0:(socklnd.c:239:ksocknal_find_peer_locked()) got peer_ni [ffff9215aa86b700] -> 12345-172.18.2.8@tcp (3)
      00000800:00000200:16.1:1652717291.344378:0:4360:0:(socklnd_cb.c:645:ksocknal_launch_connection_locked()) pre ffff9215aca22600 1
      00000800:00000200:16.1:1652717291.344379:0:4360:0:(socklnd_cb.c:645:ksocknal_launch_connection_locked()) post ffff9215aca22600 2
      

      The second one overwrites the first in ksocknal_add_peer()->ksocknal_add_conn_cb_locked(). The first one gets stuck and is never freed on shutdown.

      Attachments

        Activity

          [LU-15860] ksocknal_add_peer() race results in extra ksock_conn_cb

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48911/
          Subject: LU-15860 socklnd: Duplicate ksock_conn_cb
          Project: fs/lustre-release
          Branch: b2_15
          Current Patch Set:
          Commit: ea34ee7b40271ec23b6d9ed916a43971dd73fad5

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48911/ Subject: LU-15860 socklnd: Duplicate ksock_conn_cb Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: ea34ee7b40271ec23b6d9ed916a43971dd73fad5

          "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48911
          Subject: LU-15860 socklnd: Duplicate ksock_conn_cb
          Project: fs/lustre-release
          Branch: b2_15
          Current Patch Set: 1
          Commit: 04d225733104ad973fc4da82e9e4c8eed4677d8a

          gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48911 Subject: LU-15860 socklnd: Duplicate ksock_conn_cb Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 04d225733104ad973fc4da82e9e4c8eed4677d8a
          pjones Peter Jones added a comment -

          Landed for 2.16

          pjones Peter Jones added a comment - Landed for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47361/
          Subject: LU-15860 socklnd: Duplicate ksock_conn_cb
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 0c91d49a44e1214b5c65d4a557f6969b3d217881

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47361/ Subject: LU-15860 socklnd: Duplicate ksock_conn_cb Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0c91d49a44e1214b5c65d4a557f6969b3d217881

          "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47361
          Subject: LU-15860 socklnd: Duplicate ksock_conn_cb
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 1feb1708ac65b2aa89d20a06988734d2ff807ec7

          gerrit Gerrit Updater added a comment - "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47361 Subject: LU-15860 socklnd: Duplicate ksock_conn_cb Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1feb1708ac65b2aa89d20a06988734d2ff807ec7

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: