Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      If a peer has discovery disabled then it will not consolidate peer
      NI information. This means we need to use a consistent source NI
      when sending to it just like we do for non-MR peers.

      A comment in lnet_discovery_event_reply() indicates that this was a
      known issue, but the situation is not handled properly.

      Do not assume peers are multi-rail capable when peer objects are
      allocated and initialized.

      Do not mark a peer as multi-rail capable unless all of the following
      conditions are satisified:
      1. The peer has the MR feature flag set
      2. The peer has discovery enabled.
      3. We have discovery enabled locally

      Marked ticket as critical as it can break setups where one side has discovery enabled and the other side has it disabled.

      Attachments

        Issue Links

          Activity

            [LU-12889] Do not assume peers are MR capable

            Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40345
            Subject: LU-12889 lnet: Do not assume peers are MR capable
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 8cdd1b7b76fafaac7e14c0b9b468f01f8ea89cfe

            gerrit Gerrit Updater added a comment - Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40345 Subject: LU-12889 lnet: Do not assume peers are MR capable Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 8cdd1b7b76fafaac7e14c0b9b468f01f8ea89cfe
            pjones Peter Jones added a comment -

            ok thanks hornc. I've flagged that ticket for 2.14. It looks like both you and Amir have possible approaches for that ticket but I'll leave the two of you to duke it out on which to use

            pjones Peter Jones added a comment - ok thanks hornc . I've flagged that ticket for 2.14. It looks like both you and Amir have possible approaches for that ticket but I'll leave the two of you to duke it out on which to use
            hornc Chris Horn added a comment -

            pjones I don't think it needs to be reverted. The issue only impacts mixed MR/non-MR configurations so it shouldn't affect maloo testing. It should be sufficient to land the fix for LU-12955.

            hornc Chris Horn added a comment - pjones I don't think it needs to be reverted. The issue only impacts mixed MR/non-MR configurations so it shouldn't affect maloo testing. It should be sufficient to land the fix for LU-12955 .
            pjones Peter Jones added a comment - - edited

            Chris it seems to have landed - should it be reverted? For future reference, it is safer to apply a -1 (that can later be removed) in Gerrit if you want to "hit the pause button" on something landing for the time being - you can't assume that the gatekeeper is reading every since JIRA ticket

            pjones Peter Jones added a comment - - edited Chris it seems to have landed - should it be reverted? For future reference, it is safer to apply a -1 (that can later be removed) in Gerrit if you want to "hit the pause button" on something landing for the time being - you can't assume that the gatekeeper is reading every since JIRA ticket

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36512/
            Subject: LU-12889 lnet: Do not assume peers are MR capable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3c580c93b8d3e94fac0ac2cf3cca2ff706c6497a

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36512/ Subject: LU-12889 lnet: Do not assume peers are MR capable Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3c580c93b8d3e94fac0ac2cf3cca2ff706c6497a
            hornc Chris Horn added a comment -

            I don't know if this patch should wait to land until a solution is found for https://jira.whamcloud.com/browse/LU-12955

            hornc Chris Horn added a comment - I don't know if this patch should wait to land until a solution is found for https://jira.whamcloud.com/browse/LU-12955

            Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36512
            Subject: LU-12889 lnet: Do not assume peers are MR capable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4a448bf2e5de7675658d3c114c2b7af675b34e60

            gerrit Gerrit Updater added a comment - Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36512 Subject: LU-12889 lnet: Do not assume peers are MR capable Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4a448bf2e5de7675658d3c114c2b7af675b34e60

            People

              hornc Chris Horn
              hornc Chris Horn
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: