[LU-13750] LNet lnetctl: peer add doesn't work as specified in the manual Created: 04/Jul/20  Updated: 11/Oct/22  Resolved: 25/Aug/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The manual states

The --prim-nid (primary nid for the peer node) can go unspecified. In this case, the first listed NID in the --nid option becomes the primary nid of the peer. For example:

However after

LU-12410 lnet: Convert lnetctl peer add and del

It now creates a separate peer for each NID in the list. Which is not correct.



 Comments   
Comment by Gerrit Updater [ 16/Jul/20 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39392
Subject: LU-13750 lnet: Fix peer add command
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 261ba9963abc33ac129a71fae7b8617b31ff27a0

Comment by Chris Horn [ 16/Jul/20 ]

The change in behavior was intentional, but I guess I dropped the ball on getting the manual updated...

We had a discussion about this with James Simmons in Slack where we all agreed to drop the --ip2nets flag from these commands, and modify the behavior. This was my proposal in slack which the three of us agreed to (afair):

So I'm clear, the thinking is:
1. lnetctl peer add --prim_nid <pnid> --nid <nidlist1<,nidlist2...>>
  - Add all nids in nidlist to peer specified primary nid
2. lnetctl peer add --nid <nidlist1<,nidlist2...>>
  - For each nidlist, create a peer whose primary nid is the first nid in the nid list, and add remainings nids in the list as secondary nids 
2:58
e.g. for 2. lnetctl peer add --nid 1.1.1.[1-2]@tcp,2.2.2.[2-3]@o2ib
That would create two peers
2:58
first peer has primary 1.1.1.1@tcp, and secondary 1.1.1.2@tcp
second peer has primary 2.2.2.2@o2ib, and secondary 2.2.2.3@o2ib
hornc  3:04 PM
Where as for 1. lnetctl peer add --prim_nid 1.1.1.1@tcp --nid 1.1.1.2@tcp,2.2.2.[2-3]@o2ib
That would create a single nid with primary 1.1.1.1@tcp and secondary nids 1.1.1.2@tcp,  2.2.2.2@o2ib, and 2.2.2.3@o2ib

I think the new behavior has value because it allows someone to create a bunch of peers in one go. Do you not think it is a good idea anymore? Should we just update the documentation rather than reverting to the old behavior?

Comment by Amir Shehata (Inactive) [ 16/Jul/20 ]

I don't recall this discussion. I don't think the new behaviour is clear. You can only specify the NID list via the nid range syntax and there is no way to specify multiple NIDs to the same peer explicitly with the -- nid syntax only. I ran into this as I was refreshing the LUTF scripts. I would prefer to have the documented syntax as it is much clearer. You can add one peer per command. It is confusing when the behaviour changes with addition of --prim_nid parameter. It is similar to adding networks. You can add one network but you can specify multiple different interfaces. The structure of the network command doesn't allow you to add multiple different networks from the same command line.

If anything I would update the command to always require the --prim_nid.

Comment by Chris Horn [ 16/Jul/20 ]

I'm in favor of cli consistency. Just to clarify, there is a way to specify multiple NIDs to a single peer with just the --nid syntax. It is described in my previous comment:

2. lnetctl peer add --nid <nidlist1<,nidlist2...>>
  - For each nidlist, create a peer whose primary nid is the first nid in the nid list, and add remainings nids in the list as secondary nids 

e.g.

sles15build01:~ # lnetctl peer add --nid 192.168.1.[2-5]@tcp
sles15build01:~ # lnetctl peer show
peer:
    - primary nid: 192.168.1.2@tcp
      Multi-Rail: True
      peer ni:
        - nid: 192.168.1.2@tcp
          state: NA
        - nid: 192.168.1.3@tcp
          state: NA
        - nid: 192.168.1.4@tcp
          state: NA
        - nid: 192.168.1.5@tcp
          state: NA
sles15build01:~ #
Comment by Gerrit Updater [ 25/Aug/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39392/
Subject: LU-13750 lnet: Fix peer add command
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 30f6c3d601fb9e7bc5af8dfc7a6a4abd404aea18

Comment by Peter Jones [ 25/Aug/20 ]

Landed for 2.14

Generated at Sat Feb 10 03:03:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.