[LU-10003] lnetctl error "cannot add network: invalid argument" Created: 19/Sep/17  Updated: 26/Jan/24

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: James A Simmons
Resolution: Unresolved Votes: 2
Labels: None

Issue Links:
Related
is related to LUDOC-392 Clarify lnetctl/lctl interaction Resolved
is related to LU-10391 LNET: Support IPv6 Reopened
is related to LU-5960 Add ability to get peer and connectio... Open
is related to LU-9680 Improve the user land to kernel space... In Progress
is related to LU-16307 sanity-sec: test_31: export for 10.24... Open
is related to LU-10556 lustre client rebuild not building ln... Resolved
is related to LU-16462 conf-sanity sles12.5 test_43a: lctl: ... Resolved
is related to LU-10790 tests: lctl list_nids has been deprec... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

trying to add second interface getting an error

nbpt-serv1 ~ # ip -4 addr 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc mq state UP qlen 1024
 inet 192.168.41.10/24 brd 192.168.41.255 scope global ib0
 valid_lft forever preferred_lft forever
7: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc mq state UP qlen 1024
 inet 10.151.20.103/18 brd 10.151.63.255 scope global ib1
 valid_lft forever preferred_lft forever
8: ib2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc mq state UP qlen 1024
 inet 192.168.44.10/24 brd 192.168.44.255 scope global ib2
 valid_lft forever preferred_lft forever
9: ib3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc mq state UP qlen 1024
 inet 10.151.63.233/18 brd 10.151.63.255 scope global ib3
 valid_lft forever preferred_lft forever
10: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
 inet 172.17.0.156/16 brd 172.17.255.255 scope global bond0
 valid_lft forever preferred_lft forever


nbpt-serv1 ~ # lnetctl net show 
net:
 - net type: lo
 local NI(s):
 - nid: 0@lo
 status: up
 - net type: o2ib
 local NI(s):
 - nid: 10.151.20.103@o2ib
 status: up
 interfaces:
 0: ib1


nbpt-serv1 ~ # lnetctl net add --net o2ib --if ib3
add:
 - net:
 errno: -22
 descr: "cannot add network: Invalid argument"


nbpt-serv1 ~ # lnetctl net del --net o2ib --if ib1
del:
 - net:
 errno: -22
 descr: "cannot del network: Invalid argument"

 

 

 



 Comments   
Comment by Mahmoud Hanafi [ 19/Sep/17 ]

looks like it is fails here

 rc = l_ioctl(LNET_DEV_ID, IOC_LIBCFS_ADD_LOCAL_NI, data);
(gdb) s
l_ioctl (dev_id=dev_id@entry=0, opc=opc@entry=3233310047, buf=buf@entry=0x618190) at util/l_ioctl.c:106


$1 = {lic_cfg_hdr = {ioc_len = 2728, ioc_version = 65547}, lic_nid = 1407375061237737, lic_ni_intf = {
 "ib3", '\000' <repeats 124 times>, '\000' <repeats 127 times> <repeats 15 times>}, 
 lic_legacy_ip2nets = '\000' <repeats 127 times>, lic_cpts = {0 <repeats 128 times>}, lic_ncpts = 0, lic_status = 0, 
 lic_tcp_bonding = 0, lic_idx = 0, lic_dev_cpt = 0, pad = "\000\000\000", lic_bulk = 0x618b98 "q\004\002"}


Comment by Peter Jones [ 19/Sep/17 ]

Amir

Could you please advise?

Thanks

Peter

Comment by Amir Shehata (Inactive) [ 20/Sep/17 ]

if you try and add the same interface twice, it'll fail the second time.

But it should be able to add a different interface. Can you please enable debug logs and attach them here:

lctl set_param debug=+net
lctl set_param debug=+neterror
lnetctl net add --net o2ib --if ib3
lctl dk > log

Also just to verify can you please paste the output from:

lnetctl -h
Comment by Mahmoud Hanafi [ 21/Sep/17 ]

here is the output

elrtr10 ~ # lctl clear;lctl set_param debug=+net;lctl set_param debug=+neterror;lnetctl net add --net o2ib --if ib0;lctl dk > log
debug=+net
debug=+neterror
add:
 - net:
 errno: -22
 descr: "cannot add network: Invalid argument"
elrtr10 ~ # cat log
00000400:00000080:38.0F:1506037088.299967:0:3941:0:(module.c:121:libcfs_ioctl()) libcfs ioctl cmd 3233310047
Debug log: 1 lines, 1 kept, 0 dropped, 0 bad.
elrtr10 ~ # ip -o -4 add show
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
6: ib0 inet 10.151.26.140/18 brd 10.151.63.255 scope global ib0\ valid_lft forever preferred_lft forever
7: ib1 inet 10.151.26.60/18 brd 10.151.63.255 scope global ib1\ valid_lft forever preferred_lft forever
8: ib2 inet 10.149.26.140/18 brd 10.149.63.255 scope global ib2\ valid_lft forever preferred_lft forever
9: ib3 inet 10.149.26.60/18 brd 10.149.63.255 scope global ib3\ valid_lft forever preferred_lft forever
10: bond0 inet 172.17.0.160/16 brd 172.17.255.255 scope global bond0\ valid_lft forever preferred_lft forever



elrtr10 ~ # lnetctl -h
Try interactive use without arguments or use one of:
"lnet"
"route"
"net"
"routing"
"set"
"import"
"export"
"stats"
"numa"
"peer"
"help"
"exit"
"quit"
"--list-commands"
as argument.
Comment by Amir Shehata (Inactive) [ 22/Sep/17 ]

can you try this:

lnetctl lnet configure
lnetctl net add --net o2ib --if ib0

If you're bringing up lnet using

lctl net up

then you'll need to run

lnetctl lnet configure

before you're able to use the rest of the lnetctl commands.

In the future, it'll be a good idea to use lnetctl utility for everything:

modprobe lnet
lnetctl lnet configure
lnetctl net add --net o2ib --if ib0,ib1 # or whatever interfaces you'd like to add
# other lnetctl commands
Comment by Mahmoud Hanafi [ 22/Sep/17 ]

We load the lustre module and that starts everything. Here is our /etc/modprobe.d/lustre.conf

options ko2iblnd require_privileged_port=0 use_privileged_port=0
options ko2iblnd timeout=150 retry_count=7 map_on_demand=32 peer_credits=63 concurrent_sends=63
options ko2iblnd ntx=32768 credits=32768 fmr_pool_size=8193 


#lnet
options lnet networks="o2ib(ib0,ib1),o2ib313(ib2,ib3)" forwarding=enabled
#options lnet networks="o2ib(ib1),o2ib313(ib3)" forwarding=enabled
options lnet avoid_asym_router_failure=1 check_routers_before_use=1 small_router_buffers=65536 large_router_buffers=8192
options ptlrpc at_max=600 at_min=150
Comment by Amir Shehata (Inactive) [ 22/Sep/17 ]

yes, so in that case you'll need to call "lnetctl lnet configure" otherwise you won't be able to add networks or use other lnetctl commands.

There is a way to actually do what you're doing using the module parameters but dynamically using lnetctl. The added benefit of doing it dynamically is that you're able to assign tunables per network instead of globally. For example if you want to have map-on-demand different for different networks (this will become useful if you're using OPA and MLX for example, and you want to tune them differently on the router).

You can put all the yaml configuration in /etc/lnet.conf and then start lnet as a service which will import that file.

Comment by Mahmoud Hanafi [ 24/Sep/17 ]

The issue was not running
lnetctl lnet configure

The documentation should be more clear to indicate this.

Comment by Amir Shehata (Inactive) [ 25/Sep/17 ]

i'll push a patch to the manual to clarify that.

Comment by Amir Shehata (Inactive) [ 04/Oct/17 ]

https://review.whamcloud.com/29323

Comment by James A Simmons [ 13/Oct/17 ]

So in reality lnetctl is not long optional. I has to be installed on a node and run at startup time. In reality lctl net no long works by itself.

Comment by Amir Shehata (Inactive) [ 13/Oct/17 ]

If you don't want to use MR and you configure everything from module parameters, then you don't need to use lnetctl.

For MR and future LNet features, lnetctl will be the utility to use. no more updates will be made to lctl.

I'm looking at modifying the makefiles to always make lnetctl and install it properly.

Comment by James A Simmons [ 13/Oct/17 ]

Awesome. I was looking at doing the same thing. Especially since the lustre tools are also looking to use libyaml (LU-9324) as well. Can you base your work on top of https://review.whamcloud.com/#/c/28752. Also a review would be nice As you can see with that patch I killed off libptlctl.a and integrated the yaml stuff into liblnetconfig so liblustreapi.so can use it for LU-9324.

The reason I would also looking at this is that I use the maloo test suite but setup lnet using lnetctl. The test suite doesn't really like that. So I was looking to make the test suite work flawlessly with lnetctl.

Comment by Peter Jones [ 18/Dec/17 ]

Is any further work needed on this ticket or can the ticket be closed now that the manual has been updated?

Comment by James A Simmons [ 18/Dec/17 ]

Andreas talked about pushing patches that tell the user that lctl net is obsolete. Perhaps this is the perfect ticket to push those patches under?

Comment by Gerrit Updater [ 05/Jan/18 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/30755
Subject: LU-10003 lnet: deprecate lctl net commands
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 37c7bb1c23bed67f07bebbe9234bbcb44647f739

Comment by James A Simmons [ 05/Jan/18 ]

Patch pushed to make lctl net functions deprecate now that the work to make lnetctl a hard requirement has landed for lustre 2.11

Comment by Gerrit Updater [ 20/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30755/
Subject: LU-10003 lnet: deprecate lctl net commands
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1e6bd608a89ddaa3c731102f3721b26a47c28741

Comment by Peter Jones [ 20/Jan/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 22/Jan/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30968
Subject: LU-10003 lnet: deprecate lctl net commands
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: b7a878f82eadc21bc91fe52b873538b80f854742

Comment by Andreas Dilger [ 24/Jan/18 ]

The deprecation messages are not very clear. For example, now when I run sanity.sh it reports (among other things):

This command has been deprecated. Plesae use 'lnetctl net show'.

... but it isn't clear what This command is, so I don't even know what command to start looking for in the test scripts. It should print the command name, like:

lctl: 'list_nids' is deprecated. Please use 'lnetctl net show'.

Please fix typo Plesae in error message as well.

Also, use of the deprecated commands like "lctl list_nids" by the test scripts should be replaced by "lnetctl net show", and equivalent for "lctl network" and "lctl ping". It looks like lnetctl has existed since version v2_6_54_0-13-g0f753ea so this should be safe to land for master (we test 2.11 interop against 2.7, but no longer against 2.5).

Comment by Amir Shehata (Inactive) [ 25/Jan/18 ]

Andreas, beside fixing the deprecated message are you suggesting we also change lctl usage to lnetctl usage in the test scripts as part of this ticket?

Comment by Gerrit Updater [ 26/Jan/18 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/31030
Subject: LU-10003 lnet: clarify lctl deprecation message
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 410c34ffb78be293804b39d2bd75129b67a4a309

Comment by Gerrit Updater [ 26/Jan/18 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/31031
Subject: LU-10003 tests: replace lctl with lnetctl for lnet
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bf6785279267934826457b3051e5ea8ca8734634

Comment by Amir Shehata (Inactive) [ 02/Feb/18 ]

chatted with jhammond regarding the deprecation and he thinks that deprecating list_nids is going to cause trouble for the customer. We wanted to get some feedback on whether we should remove list_nids from the set of commands being deprecated.

Comment by Gerrit Updater [ 06/Feb/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31030/
Subject: LU-10003 lnet: clarify lctl deprecation message
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 73867ddf0c34d5e702ee4dcda850cf974bdb33da

Comment by John Hammond [ 27/Feb/18 ]

I think the deprecation warnings should be removed entirely.

Comment by James A Simmons [ 27/Feb/18 ]

In order for that to happen lctl would have to work again. The changes to the LNet layer for multirail support broke the lctl net functionality. Also the current lnetctl tools don't work with pre-2.10 version as a side note.

Comment by Peter Jones [ 06/Mar/18 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/31534
Subject: LU-10003 lnet: remove lctl deprecation messages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5009ae1fdc2110a4251a57b58bbbb82663f2edab

Comment by Gerrit Updater [ 08/Mar/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31534/
Subject: LU-10003 lnet: remove lctl deprecation messages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 423bdae48e1a0ed602f524ba41a2c83c67f4716a

Comment by Mahmoud Hanafi [ 29/Aug/18 ]

this can be closed

Comment by Peter Jones [ 30/Aug/18 ]

I think that this ticket had been kept open because  https://review.whamcloud.com/#/c/31031/ has not landed yet. Is this still needed? If so, let's reopen the ticket and rebase the patch to get it landed, if not let's abandon the patch

Comment by Gerrit Updater [ 10/Oct/22 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48814
Subject: LU-10003 lnet: use Netlink to support old and new NI APIs.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ca490c04254a8b345a5ef4ea6037dd082edd8344

Comment by James A Simmons [ 10/Oct/22 ]

Using this ticket to unify the MR and preMR APIs. We can kill off the old ioctls and keep the lctl LNet functionality since it will never go away

Comment by Gerrit Updater [ 08/Nov/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48814/
Subject: LU-10003 lnet: use Netlink to support old and new NI APIs.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8f8f6e2f36e56e53e38447583a08b7794f156c47

Comment by James A Simmons [ 08/Nov/22 ]

More patches to go.

Comment by Gerrit Updater [ 08/Nov/22 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49068
Subject: LU-10003 lnet: to properly handle errors reported use ssize_t
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e0f25c6fec1c5fdfcc9005407f7c06f5529a879f

Comment by Gerrit Updater [ 10/Nov/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49068/
Subject: LU-10003 lnet: to properly handle errors reported use ssize_t
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3737b28a9d014a0d0cf6a072f874990e771e06a3

Comment by Gerrit Updater [ 10/Dec/22 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49360
Subject: LU-10003 lnet: use Netlink to support LNet ping commands
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c9e8f052bcf896a07e440f383a104b5006dcaf71

Comment by Gerrit Updater [ 19/Jan/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49360/
Subject: LU-10003 lnet: use Netlink to support LNet ping commands
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d137e9823ca1e97fccee59913d5a7bf1891b825a

Comment by James A Simmons [ 19/Jan/23 ]

More work left.

Comment by Gerrit Updater [ 27/Mar/23 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50440
Subject: LU-10003 utils: migrate old route API to Netlink / YAML API
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 10e1e556bc07546d79e0c41e319dce60c58bb304

Comment by Gerrit Updater [ 03/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50440/
Subject: LU-10003 lnet: migrate old route API to Netlink / YAML API
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8b51c333716adade3c89b10fedbe2ca1851e027c

Comment by Gerrit Updater [ 27/Dec/23 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53556
Subject: LU-10003 lnet: implement Netlink version of lnet distance API.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7f98f120831eee86b5657966774061d3321abf3c

Comment by Gerrit Updater [ 10/Jan/24 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53556/
Subject: LU-10003 lnet: implement Netlink version of lnet distance API.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b62c385e180fbdf85533e334fa63d6b9c6bb2452

Comment by Gerrit Updater [ 26/Jan/24 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53835
Subject: LU-10003 lnet: Update lctl ping to work with large NIDs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: dc0098e601cb613a87946548a2ffa5a8f4d4e2c0

Generated at Sat Feb 10 02:31:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.