Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17632

o2iblnd: graceful handling of unexpected CM_EVENT_CONNECT_ERROR

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There were examples in the field with RoCE setups which demonstrate that RDMA_CM_EVENT_CONNECT_ERROR may be received when connection is neither in IBLND_CONN_ACTIVE_CONNECT nor IBLND_CONN_PASSIVE_WAIT state

      This causes the assertion in kiblnd_cm_callback() to fail:

       ASSERTION( conn->ibc_state == 1 || conn->ibc_state == 2 )

      It is proposed to handle this in a more gracious manner:  report the event as unexpected and allow the flow to continue. If there are indeed issues on the connection, it is expected to report transaction errors and get cleaned up without crashing the whole system.

      Attachments

        Issue Links

          Activity

            [LU-17632] o2iblnd: graceful handling of unexpected CM_EVENT_CONNECT_ERROR

            "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56521
            Subject: LU-17632 o2iblnd: graceful handling of CM_EVENT_CONNECT_ERROR
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: bfc7e4cd3670414c87aaa3f3c73c70bde41bf52a

            gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56521 Subject: LU-17632 o2iblnd: graceful handling of CM_EVENT_CONNECT_ERROR Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: bfc7e4cd3670414c87aaa3f3c73c70bde41bf52a
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54353/
            Subject: LU-17632 o2iblnd: graceful handling of CM_EVENT_CONNECT_ERROR
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7f27a2fceef9a03d3ada74e258e774c8f5d420f0

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54353/ Subject: LU-17632 o2iblnd: graceful handling of CM_EVENT_CONNECT_ERROR Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7f27a2fceef9a03d3ada74e258e774c8f5d420f0

            "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54353
            Subject: LU-17632 o2iblnd: graceful handling of CM_EVENT_CONNECT_ERROR
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2d108ce3ee3e06ea0fd77cde2c0c03e32a370a82

            gerrit Gerrit Updater added a comment - "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54353 Subject: LU-17632 o2iblnd: graceful handling of CM_EVENT_CONNECT_ERROR Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2d108ce3ee3e06ea0fd77cde2c0c03e32a370a82

            Note that https://review.whamcloud.com/#/c/fs/lustre-release/+/53986/ (LU-17480) is addressing the same issue in a more systematic way.

            ssmirnov Serguei Smirnov added a comment - Note that https://review.whamcloud.com/#/c/fs/lustre-release/+/53986/ ( LU-17480 ) is addressing the same issue in a more systematic way.

            People

              ssmirnov Serguei Smirnov
              ssmirnov Serguei Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: