Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10124

lnetctl: lnetctl import --add not importing peers correctly

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0, Lustre 2.10.7
    • Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0
    • Centos 7.4

    Description

      When importing a yaml config file for peers the import does not correctly set the Multi-Rail property when it is false.

      An example:

      peer:
          - primary nid: 10.112.1.60@o2ib8
            Multi-Rail: False
            peer ni:
              - nid: 10.112.1.60@o2ib8
                state: up
      

      When imported it results in a running config of:

      peer:
          - primary nid: 10.112.1.60@o2ib8
            Multi-Rail: True
            peer ni:
              - nid: 10.112.1.60@o2ib8
                state: up
      

      For our config this isn't an issue yet, but as we will have a mix of multi-rail and non-multi-rail nodes this could be an issue moving forward.

      Attachments

        Activity

          [LU-10124] lnetctl: lnetctl import --add not importing peers correctly

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32255/
          Subject: LU-10124 lnet: Correctly add peer MR value while importing
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set:
          Commit: 8103e94c1bd3000bc25da0d05f0ef3cafa1f91fd

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32255/ Subject: LU-10124 lnet: Correctly add peer MR value while importing Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 8103e94c1bd3000bc25da0d05f0ef3cafa1f91fd
          pjones Peter Jones added a comment -

          Landed for 2.12

          pjones Peter Jones added a comment - Landed for 2.12

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31138/
          Subject: LU-10124 lnet: Correctly add peer MR value while importing
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 99494a28e6afde7c59e7f03045e63028ece1064d

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31138/ Subject: LU-10124 lnet: Correctly add peer MR value while importing Project: fs/lustre-release Branch: master Current Patch Set: Commit: 99494a28e6afde7c59e7f03045e63028ece1064d

          Just pushed the back-ported patch for b2_10 to make it easy for you to apply the patch and test.

          sharmaso Sonia Sharma (Inactive) added a comment - Just pushed the back-ported patch for b2_10 to make it easy for you to apply the patch and test.

          Sonia Sharma (sonia.sharma@intel.com) uploaded a new patch: https://review.whamcloud.com/32255
          Subject: LU-10124 lnet: Correctly add peer MR value while importing
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set: 1
          Commit: c6fcf5a01fa4da0b026498b16927fa6c86cc1918

          gerrit Gerrit Updater added a comment - Sonia Sharma (sonia.sharma@intel.com) uploaded a new patch: https://review.whamcloud.com/32255 Subject: LU-10124 lnet: Correctly add peer MR value while importing Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: c6fcf5a01fa4da0b026498b16927fa6c86cc1918

          Hi Malcolm,

          Can you please attach here the YAML file you are using for configuration. We can try reproducing the issue using that YAML file.

          Thanks

          sharmaso Sonia Sharma (Inactive) added a comment - Hi Malcolm, Can you please attach here the YAML file you are using for configuration. We can try reproducing the issue using that YAML file. Thanks

          I think its a race condition. It happens because we are importing ~4000 nodes while in production (so along side normal discovery)

          I doubt you will trigger it with two.

          mhaakddn Malcolm Haak - NCI (Inactive) added a comment - I think its a race condition. It happens because we are importing ~4000 nodes while in production (so along side normal discovery) I doubt you will trigger it with two.
          sharmaso Sonia Sharma (Inactive) added a comment - - edited

          Oh okay. So I noticed that I never updated the patch which had issue. Just did that.

          And now with the patch, I just tried it on my system and I could not replicate the issue.

          [root@lutfRtr1-linux ~]# lnetctl ping 10.211.55.9@tcp
          ping:
              - primary nid: 10.211.55.9@tcp
                Multi-Rail: False
                peer ni:
                  - nid: 10.211.55.9@tcp
          
          [root@lutfRtr1-linux lustre-release]# lnetctl peer add --prim_nid 10.9.60.24@tcp
          
          [root@lutfRtr1-linux lustre-release]# lnetctl peer show
          peer:
              - primary nid: 10.211.55.9@tcp
                Multi-Rail: False
                peer ni:
                  - nid: 10.211.55.9@tcp
                    state: NA
              - primary nid: 10.9.60.24@tcp
                Multi-Rail: True
                peer ni:
                  - nid: 10.9.60.24@tcp
                    state: NA
          
          [root@lutfRtr1-linux lustre-release]# lnetctl export > out.yaml
          
          [root@lutfRtr1-linux lustre-release]# lnetctl peer show
          peer:
          
          [root@lutfRtr1-linux lustre-release]# lnetctl import < out.yaml
          
          [root@lutfRtr1-linux lustre-release]# lnetctl peer show
          peer:
              - primary nid: 10.211.55.9@tcp
                Multi-Rail: False
                peer ni:
                  - nid: 10.211.55.9@tcp
                    state: NA
              - primary nid: 10.9.60.24@tcp
                Multi-Rail: True
                peer ni:
                  - nid: 10.9.60.24@tcp
                    state: NA

          How are you adding peers? Can you list the commands you are running to add peers.

          Though I tried both ways - using "lnetctl" command and running traffic and was able to import peers correctly.

          sharmaso Sonia Sharma (Inactive) added a comment - - edited Oh okay. So I noticed that I never updated the patch which had issue. Just did that. And now with the patch, I just tried it on my system and I could not replicate the issue. [root@lutfRtr1-linux ~]# lnetctl ping 10.211.55.9@tcp ping:     - primary nid: 10.211.55.9@tcp       Multi-Rail: False       peer ni:         - nid: 10.211.55.9@tcp [root@lutfRtr1-linux lustre-release]# lnetctl peer add --prim_nid 10.9.60.24@tcp [root@lutfRtr1-linux lustre-release]# lnetctl peer show peer:     - primary nid: 10.211.55.9@tcp       Multi-Rail: False       peer ni:         - nid: 10.211.55.9@tcp           state: NA     - primary nid: 10.9.60.24@tcp       Multi-Rail: True       peer ni:         - nid: 10.9.60.24@tcp           state: NA [root@lutfRtr1-linux lustre-release]# lnetctl export > out.yaml [root@lutfRtr1-linux lustre-release]# lnetctl peer show peer: [root@lutfRtr1-linux lustre-release]# lnetctl import < out.yaml [root@lutfRtr1-linux lustre-release]# lnetctl peer show peer:     - primary nid: 10.211.55.9@tcp       Multi-Rail: False       peer ni:         - nid: 10.211.55.9@tcp           state: NA     - primary nid: 10.9.60.24@tcp       Multi-Rail: True       peer ni:         - nid: 10.9.60.24@tcp           state: NA How are you adding peers? Can you list the commands you are running to add peers. Though I tried both ways - using "lnetctl" command and running traffic and was able to import peers correctly.

          I think you misunderstand. It's merging peers that AREN'T supposed to be merged.

          Say peer A is in the file with a nid of 10.9.12.38@o2ib3:

          peer:
              - primary nid: 10.9.12.38@o2ib3
                Multi-Rail: True
                peer ni:
                  - nid: 10.9.12.28@o2ib3
                    state: up
          

          and peer B is next in the YAML file with a nid of 10.9.60.1@o2ib3

          peer:
              - primary nid: 10.9.60.1@o2ib3
                Multi-Rail: False
                peer ni:
                  - nid: 10.9.60.1@o2ib3
                    state: up
          

          So the resulting peer config YAML file should look like

          peer:
              - primary nid: 10.9.12.38@o2ib3
                Multi-Rail: False
                peer ni:
                  - nid: 10.9.12.38@o2ib3
                    state: up
              - primary nid: 10.9.60.1@o2ib3
                Multi-Rail: False
                peer ni:
                  - nid: 10.9.60.1@o2ib3
                    state: up
          

          It's trying to add 10.9.60.1@o2ib3 as an extra peer ni to 10.9.12.38@o2ib3.

          This is wrong. They are separate peers. I've checked the YAML file and they are both described in YAML correctly. There is something wrong with the YAML parser that it causing it to not parse correctly.

          mhaakddn Malcolm Haak - NCI (Inactive) added a comment - - edited I think you misunderstand. It's merging peers that AREN'T supposed to be merged. Say peer A is in the file with a nid of 10.9.12.38@o2ib3: peer: - primary nid: 10.9.12.38@o2ib3 Multi-Rail: True peer ni: - nid: 10.9.12.28@o2ib3 state: up and peer B is next in the YAML file with a nid of 10.9.60.1@o2ib3 peer: - primary nid: 10.9.60.1@o2ib3 Multi-Rail: False peer ni: - nid: 10.9.60.1@o2ib3 state: up So the resulting peer config YAML file should look like peer: - primary nid: 10.9.12.38@o2ib3 Multi-Rail: False peer ni: - nid: 10.9.12.38@o2ib3 state: up - primary nid: 10.9.60.1@o2ib3 Multi-Rail: False peer ni: - nid: 10.9.60.1@o2ib3 state: up It's trying to add 10.9.60.1@o2ib3 as an extra peer ni to 10.9.12.38@o2ib3. This is wrong. They are separate peers. I've checked the YAML file and they are both described in YAML correctly. There is something wrong with the YAML parser that it causing it to not parse correctly.

          Is the issue happening even after applying the patch?
          When the MR value is correctly imported, it would know that the peer is MR and thus another NID should be merged to the same peer.

          sharmaso Sonia Sharma (Inactive) added a comment - Is the issue happening even after applying the patch? When the MR value is correctly imported, it would know that the peer is MR and thus another NID should be merged to the same peer.

          People

            sharmaso Sonia Sharma (Inactive)
            mhaakddn Malcolm Haak - NCI (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: