[LU-10124] lnetctl: lnetctl import --add not importing peers correctly Created: 16/Oct/17 Updated: 09/Jun/20 Resolved: 21/Sep/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.12.0, Lustre 2.10.7 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Malcolm Haak - NCI (Inactive) | Assignee: | Sonia Sharma (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lnet, lnetctl | ||
| Environment: |
Centos 7.4 |
||
| Issue Links: |
|
||||
| Epic/Theme: | lnet, lustre-2.10.1 | ||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
When importing a yaml config file for peers the import does not correctly set the Multi-Rail property when it is false. An example: peer:
- primary nid: 10.112.1.60@o2ib8
Multi-Rail: False
peer ni:
- nid: 10.112.1.60@o2ib8
state: up
When imported it results in a running config of: peer:
- primary nid: 10.112.1.60@o2ib8
Multi-Rail: True
peer ni:
- nid: 10.112.1.60@o2ib8
state: up
For our config this isn't an issue yet, but as we will have a mix of multi-rail and non-multi-rail nodes this could be an issue moving forward. |
| Comments |
| Comment by Malcolm Haak - NCI (Inactive) [ 22/Oct/17 ] |
|
We have encountered a second issue when importing peers. Two peers with a single primary nid were merged during import into one peer with two peer ni's The YAML file being used was one that was exported from the lnet router it was imported on. Not sure if it is a race condition when importing 4000+ peers or some other issue. |
| Comment by James Nunez (Inactive) [ 20/Dec/17 ] |
|
Sonia, Thank you. |
| Comment by Gerrit Updater [ 02/Feb/18 ] |
|
Sonia Sharma (sonia.sharma@intel.com) uploaded a new patch: https://review.whamcloud.com/31138 |
| Comment by Malcolm Haak - NCI (Inactive) [ 17/Apr/18 ] |
|
This patch should fix up that one issue, But the second issue of peer merging does not appear to be solved. Did you want a second ticket for this issue? I will get you some log lines from the affected nodes. |
| Comment by Kim Sebo [ 17/Apr/18 ] |
|
log line on lnet router is: LNetError: 8507:0:(peer.c:806:lnet_add_peer_ni_to_prim_lpni()) Cannot add NID 10.9.60.1@o2ib3 owned by peer 10.9.60.1@o2ib3 to peer 10.9.12.38@o2ib3 The two 10.9.x.x addresses mentioned correspond to adjacent entries in the config file. |
| Comment by Sonia Sharma (Inactive) [ 17/Apr/18 ] |
|
Is the issue happening even after applying the patch? |
| Comment by Malcolm Haak - NCI (Inactive) [ 18/Apr/18 ] |
|
I think you misunderstand. It's merging peers that AREN'T supposed to be merged. Say peer A is in the file with a nid of 10.9.12.38@o2ib3: peer:
- primary nid: 10.9.12.38@o2ib3
Multi-Rail: True
peer ni:
- nid: 10.9.12.28@o2ib3
state: up
and peer B is next in the YAML file with a nid of 10.9.60.1@o2ib3 peer:
- primary nid: 10.9.60.1@o2ib3
Multi-Rail: False
peer ni:
- nid: 10.9.60.1@o2ib3
state: up
So the resulting peer config YAML file should look like peer:
- primary nid: 10.9.12.38@o2ib3
Multi-Rail: False
peer ni:
- nid: 10.9.12.38@o2ib3
state: up
- primary nid: 10.9.60.1@o2ib3
Multi-Rail: False
peer ni:
- nid: 10.9.60.1@o2ib3
state: up
It's trying to add 10.9.60.1@o2ib3 as an extra peer ni to 10.9.12.38@o2ib3. This is wrong. They are separate peers. I've checked the YAML file and they are both described in YAML correctly. There is something wrong with the YAML parser that it causing it to not parse correctly. |
| Comment by Sonia Sharma (Inactive) [ 18/Apr/18 ] |
|
Oh okay. So I noticed that I never updated the patch which had issue. Just did that. And now with the patch, I just tried it on my system and I could not replicate the issue. [root@lutfRtr1-linux ~]# lnetctl ping 10.211.55.9@tcp
ping:
- primary nid: 10.211.55.9@tcp
Multi-Rail: False
peer ni:
- nid: 10.211.55.9@tcp
[root@lutfRtr1-linux lustre-release]# lnetctl peer add --prim_nid 10.9.60.24@tcp
[root@lutfRtr1-linux lustre-release]# lnetctl peer show
peer:
- primary nid: 10.211.55.9@tcp
Multi-Rail: False
peer ni:
- nid: 10.211.55.9@tcp
state: NA
- primary nid: 10.9.60.24@tcp
Multi-Rail: True
peer ni:
- nid: 10.9.60.24@tcp
state: NA
[root@lutfRtr1-linux lustre-release]# lnetctl export > out.yaml
[root@lutfRtr1-linux lustre-release]# lnetctl peer show
peer:
[root@lutfRtr1-linux lustre-release]# lnetctl import < out.yaml
[root@lutfRtr1-linux lustre-release]# lnetctl peer show
peer:
- primary nid: 10.211.55.9@tcp
Multi-Rail: False
peer ni:
- nid: 10.211.55.9@tcp
state: NA
- primary nid: 10.9.60.24@tcp
Multi-Rail: True
peer ni:
- nid: 10.9.60.24@tcp
state: NA
How are you adding peers? Can you list the commands you are running to add peers. Though I tried both ways - using "lnetctl" command and running traffic and was able to import peers correctly. |
| Comment by Malcolm Haak - NCI (Inactive) [ 20/Apr/18 ] |
|
I think its a race condition. It happens because we are importing ~4000 nodes while in production (so along side normal discovery) I doubt you will trigger it with two. |
| Comment by Sonia Sharma (Inactive) [ 02/May/18 ] |
|
Hi Malcolm, Can you please attach here the YAML file you are using for configuration. We can try reproducing the issue using that YAML file. Thanks |
| Comment by Gerrit Updater [ 02/May/18 ] |
|
Sonia Sharma (sonia.sharma@intel.com) uploaded a new patch: https://review.whamcloud.com/32255 |
| Comment by Sonia Sharma (Inactive) [ 02/May/18 ] |
|
Just pushed the back-ported patch for b2_10 to make it easy for you to apply the patch and test. |
| Comment by Gerrit Updater [ 21/Sep/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31138/ |
| Comment by Peter Jones [ 21/Sep/18 ] |
|
Landed for 2.12 |
| Comment by Gerrit Updater [ 02/Mar/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32255/ |