Details
-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
Lustre 2.12.0
-
None
-
CentOS 7.6 (3.10.0-957.5.1.el7.x86_64), Lustre 2.12.0
-
3
-
9223372036854775807
Description
On clients, we're using lnet.service with the following config:
[root@sh-112-12 ~]# cat /etc/lnet.conf net: - net type: o2ib4 local NI(s): - nid: interfaces: 0: ib0 route: - net: o2ib1 gateway: 10.9.0.[31-32]@o2ib4 - net: o2ib5 gateway: 10.9.0.[41-42]@o2ib4 - net: o2ib7 gateway: 10.9.0.[21-24]@o2ib4 [root@sh-112-12 ~]# lctl list_nids 10.10.112.12@tcp 10.9.112.12@o2ib4
[root@sh-112-12 ~]# dmesg | grep -i lnet [ 397.762804] LNet: HW NUMA nodes: 2, HW CPU cores: 20, npartitions: 2 [ 398.995449] LNet: 13837:0:(socklnd.c:2655:ksocknal_enumerate_interfaces()) Ignoring interface enp4s0f1 (down) [ 399.005708] LNet: Added LNI 10.10.112.12@tcp [8/256/0/180] [ 399.011316] LNet: Accept secure, port 988 [ 399.060725] LNet: Using FastReg for registration [ 399.075936] LNet: Added LNI 10.9.112.12@o2ib4 [8/256/0/180]
It is unclear why it does that at this point.
client network config:
[root@sh-112-12 ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp4s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 0c:c4:7a:dc:96:ae brd ff:ff:ff:ff:ff:ff inet 10.10.112.12/16 brd 10.10.255.255 scope global enp4s0f0 valid_lft forever preferred_lft forever inet6 fe80::ec4:7aff:fedc:96ae/64 scope link valid_lft forever preferred_lft forever 3: enp4s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 0c:c4:7a:dc:96:af brd ff:ff:ff:ff:ff:ff 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256 link/infiniband 20:00:10:8b:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a0:9e:20 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 10.9.112.12/16 brd 10.9.255.255 scope global ib0 valid_lft forever preferred_lft forever inet6 fe80::268a:703:a0:9e20/64 scope link valid_lft forever preferred_lft forever
lnet.service origin:
[root@sh-112-12 ~]# rpm -qf /usr/lib/systemd/system/lnet.service lustre-client-2.12.0-1.el7.x86_64 [root@sh-112-12 ~]# rpm -q --info lustre-client Name : lustre-client Version : 2.12.0 Release : 1.el7 Architecture: x86_64 Install Date: Wed 06 Feb 2019 10:13:52 AM PST Group : System Environment/Kernel Size : 2007381 License : GPL Signature : (none) Source RPM : lustre-client-2.12.0-1.el7.src.rpm Build Date : Fri 21 Dec 2018 01:53:18 PM PST Build Host : trevis-307-el7-x8664-3.trevis.whamcloud.com Relocations : (not relocatable) URL : https://wiki.whamcloud.com/ Summary : Lustre File System Description : Userspace tools and files for the Lustre file system. [root@sh-112-12 ~]# cat /usr/lib/systemd/system/lnet.service [Unit] Description=lnet management Requires=network-online.target After=network-online.target openibd.service rdma.service ConditionPathExists=!/proc/sys/lnet/ [Service] Type=oneshot RemainAfterExit=true ExecStart=/sbin/modprobe lnet ExecStart=/usr/sbin/lnetctl lnet configure ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf ExecStop=/usr/sbin/lustre_rmmod ptlrpc ExecStop=/usr/sbin/lnetctl lnet unconfigure ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs [Install] WantedBy=multi-user.target
This leads to many issues server-side with 2.12, as reported in LU-11888 and LU-11936.
Thanks!
Stephane
Hi Stephane,
lnetctl lnet configure should not configure any networks. The default tcp would get configured if somewhere you're doing lctl net up. That would load the default tcp network.
To disable discovery you can add
on all the nodes.
My hunch at the moment is that there are some nodes which are using lctl net up or lnetctl lnet configure --all. This would lead to the tcp network being loaded, especially if you don't have an "options network" in your "modprobe.d/lnet.conf" file
Would you be able to check that?