Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17258

socklnd connection type not established upon connection race

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The following assertion was triggered on one of our clusters:

      socklnd_cb.c:1950:ksocknal_connect()) ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed:
      socklnd_cb.c:1950:ksocknal_connect()) LBUG

      From crash dumps, we can see that the conn_cb has been set with:

      struct ksock_conn_cb {
      ...
      ksnr_scheduled = 1,
      ksnr_connecting = 1,
      ksnr_connected = 10,
      ksnr_deleted = 0,
      ksnr_ctrl_conn_count = 1,
      ksnr_blki_conn_count = 1,
      ksnr_blko_conn_count = 0,
      ksnr_conn_count = 2,
      ksnr_max_conns = 8,
      ksnr_busy_retry_count = 3
      }

      The debug log shows that a connection race between the two peers is being hit three times, which causes the ksnr_busy_retry_count = 3 in the conn_cb.

      hornc has suggested a fix for this, which we will be submitting in a bit.

      Attachments

        Issue Links

          Activity

            [LU-17258] socklnd connection type not established upon connection race
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53955/
            Subject: LU-17258 socklnd: stop connecting on too many retries
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 02caf7170762d97dac4f367651addc7d90b6eb32

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53955/ Subject: LU-17258 socklnd: stop connecting on too many retries Project: fs/lustre-release Branch: master Current Patch Set: Commit: 02caf7170762d97dac4f367651addc7d90b6eb32

            "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53955
            Subject: LU-17258 socklnd: stop connecting on too many retries
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f3d666c9f05fa174365fdc3b032b84f50781f36c

            gerrit Gerrit Updater added a comment - "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53955 Subject: LU-17258 socklnd: stop connecting on too many retries Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f3d666c9f05fa174365fdc3b032b84f50781f36c
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52957/
            Subject: LU-17258 socklnd: ensure connection type established upon race
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5afe3b0538c533c3cca370bc9c0901abccca299a

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52957/ Subject: LU-17258 socklnd: ensure connection type established upon race Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5afe3b0538c533c3cca370bc9c0901abccca299a

            "Nikitas Angelinas <nikitas.angelinas@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52957
            Subject: LU-17258 socklnd: ensure connection type established upon race
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 188a3a633dd2df8084722f95772831f46064fc12

            gerrit Gerrit Updater added a comment - "Nikitas Angelinas <nikitas.angelinas@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52957 Subject: LU-17258 socklnd: ensure connection type established upon race Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 188a3a633dd2df8084722f95772831f46064fc12

            People

              ssmirnov Serguei Smirnov
              nangelinas Nikitas Angelinas
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: