Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • None
    • Lustre 2.12.0
    • None
    • CentOS 7.6 (3.10.0-957.5.1.el7.x86_64), Lustre 2.12.0
    • 3
    • 9223372036854775807

    Description

      On clients, we're using lnet.service with the following config:

      [root@sh-112-12 ~]# cat /etc/lnet.conf 
      net:
          - net type: o2ib4
            local NI(s):
              - nid:
                interfaces:
                    0: ib0
      route: 
          - net: o2ib1
            gateway: 10.9.0.[31-32]@o2ib4
          - net: o2ib5
            gateway: 10.9.0.[41-42]@o2ib4
          - net: o2ib7
            gateway: 10.9.0.[21-24]@o2ib4
      [root@sh-112-12 ~]# lctl list_nids
      10.10.112.12@tcp
      10.9.112.12@o2ib4
      
      [root@sh-112-12 ~]# dmesg | grep -i lnet
      [  397.762804] LNet: HW NUMA nodes: 2, HW CPU cores: 20, npartitions: 2
      [  398.995449] LNet: 13837:0:(socklnd.c:2655:ksocknal_enumerate_interfaces()) Ignoring interface enp4s0f1 (down)
      [  399.005708] LNet: Added LNI 10.10.112.12@tcp [8/256/0/180]
      [  399.011316] LNet: Accept secure, port 988
      [  399.060725] LNet: Using FastReg for registration
      [  399.075936] LNet: Added LNI 10.9.112.12@o2ib4 [8/256/0/180]
      

      It is unclear why it does that at this point.

       

      client network config:

      [root@sh-112-12 ~]# ip addr
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host 
             valid_lft forever preferred_lft forever
      2: enp4s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
          link/ether 0c:c4:7a:dc:96:ae brd ff:ff:ff:ff:ff:ff
          inet 10.10.112.12/16 brd 10.10.255.255 scope global enp4s0f0
             valid_lft forever preferred_lft forever
          inet6 fe80::ec4:7aff:fedc:96ae/64 scope link 
             valid_lft forever preferred_lft forever
      3: enp4s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 0c:c4:7a:dc:96:af brd ff:ff:ff:ff:ff:ff
      4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
          link/infiniband 20:00:10:8b:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a0:9e:20 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
          inet 10.9.112.12/16 brd 10.9.255.255 scope global ib0
             valid_lft forever preferred_lft forever
          inet6 fe80::268a:703:a0:9e20/64 scope link 
             valid_lft forever preferred_lft forever
      

       

      lnet.service origin:

      [root@sh-112-12 ~]# rpm -qf /usr/lib/systemd/system/lnet.service 
      lustre-client-2.12.0-1.el7.x86_64
      [root@sh-112-12 ~]# rpm -q --info lustre-client
      Name        : lustre-client
      Version     : 2.12.0
      Release     : 1.el7
      Architecture: x86_64
      Install Date: Wed 06 Feb 2019 10:13:52 AM PST
      Group       : System Environment/Kernel
      Size        : 2007381
      License     : GPL
      Signature   : (none)
      Source RPM  : lustre-client-2.12.0-1.el7.src.rpm
      Build Date  : Fri 21 Dec 2018 01:53:18 PM PST
      Build Host  : trevis-307-el7-x8664-3.trevis.whamcloud.com
      Relocations : (not relocatable)
      URL         : https://wiki.whamcloud.com/
      Summary     : Lustre File System
      Description :
      Userspace tools and files for the Lustre file system.
      [root@sh-112-12 ~]# cat /usr/lib/systemd/system/lnet.service 
      [Unit]
      Description=lnet management
      
      Requires=network-online.target
      After=network-online.target openibd.service rdma.service
      
      ConditionPathExists=!/proc/sys/lnet/
      
      [Service]
      Type=oneshot
      RemainAfterExit=true
      ExecStart=/sbin/modprobe lnet
      ExecStart=/usr/sbin/lnetctl lnet configure
      ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf
      ExecStop=/usr/sbin/lustre_rmmod ptlrpc
      ExecStop=/usr/sbin/lnetctl lnet unconfigure
      ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs
      
      [Install]
      WantedBy=multi-user.target
      

      This leads to many issues server-side with 2.12, as reported in LU-11888 and LU-11936.

      Thanks!
      Stephane

      Attachments

        Activity

          [LU-11937] lnet.service randomly load tcp NIDs

          People

            ashehata Amir Shehata (Inactive)
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: