Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17258

socklnd connection type not established upon connection race

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The following assertion was triggered on one of our clusters:

      socklnd_cb.c:1950:ksocknal_connect()) ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed:
      socklnd_cb.c:1950:ksocknal_connect()) LBUG

      From crash dumps, we can see that the conn_cb has been set with:

      struct ksock_conn_cb {
      ...
      ksnr_scheduled = 1,
      ksnr_connecting = 1,
      ksnr_connected = 10,
      ksnr_deleted = 0,
      ksnr_ctrl_conn_count = 1,
      ksnr_blki_conn_count = 1,
      ksnr_blko_conn_count = 0,
      ksnr_conn_count = 2,
      ksnr_max_conns = 8,
      ksnr_busy_retry_count = 3
      }

      The debug log shows that a connection race between the two peers is being hit three times, which causes the ksnr_busy_retry_count = 3 in the conn_cb.

      hornc has suggested a fix for this, which we will be submitting in a bit.

      Attachments

        Issue Links

          Activity

            [LU-17258] socklnd connection type not established upon connection race
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-5509 [ DDN-5509 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-4560 [ DDN-4560 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-4788 [ DDN-4788 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-4734 [ DDN-4734 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-17515 [ LU-17515 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-17513 [ LU-17513 ]
            adilger Andreas Dilger made changes -
            Assignee Original: Nikitas Angelinas [ nangelinas ] New: Serguei Smirnov [ ssmirnov ]
            Resolution Original: Fixed [ 1 ]
            Status Original: Resolved [ 5 ] New: Reopened [ 4 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-4704 [ DDN-4704 ]
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.16.0 [ 15190 ]
            Resolution New: Fixed [ 1 ]
            Status Original: In Progress [ 3 ] New: Resolved [ 5 ]

            People

              ssmirnov Serguei Smirnov
              nangelinas Nikitas Angelinas
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: