Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16908

lnet module ip2nets parameter not acting as documented

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The ip2nets parameter, when matching a bound IP address on an interface of the node, may activate a different LND interface with the wrong IP address.

      This may be related to LUs:
      https://jira.whamcloud.com/browse/LU-7563
      https://jira.whamcloud.com/browse/LU-11859

      The Note/example on page 69 of https://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.pdf doesn't match what actually happens.

       

      If an interface is explicitly specified as well as a pattern, the interface matched using the IP pattern will be sanitized against the explicitly-defined interface.
      
      For example, tcp(eth0) 192.168.*.3 and there exists in the system eth0 == 192.158.19.3 and eth1 == 192.168.3.3, then the configuration will fail, because the pattern contradicts the interface specified.
      
      A clear warning will be displayed if inconsistent configuration is encountered.
      

       

      Example showing the issue with stock lnet.ko, working through the above steps:

      [root@lustre.test ~]# ip netns add test
      [root@lustre.test ~]# ip -n test link add eth0 type dummy
      [root@lustre.test ~]# ip -n test link add eth1 type dummy
      [root@lustre.test ~]# ip -n test addr add 127.0.0.1/8 brd + dev lo
      [root@lustre.test ~]# ip -n test addr add 192.158.19.3/24 brd + dev eth0
      [root@lustre.test ~]# ip -n test addr add 192.168.3.3/24 brd + dev eth1
      [root@lustre.test ~]# ip -n test link set up dev lo
      [root@lustre.test ~]# ip -n test link set up dev eth0
      [root@lustre.test ~]# ip -n test link set up dev eth1
      
      [root@lustre.test ~]# ip netns exec test bash
      
      [root@lustre.test ~]# ip -4 addr list
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
             valid_lft forever preferred_lft forever
      2: eth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
          inet 192.158.19.3/24 brd 192.158.19.255 scope global eth0
             valid_lft forever preferred_lft forever
      3: eth1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
          inet 192.168.3.3/24 brd 192.168.3.255 scope global eth1
             valid_lft forever preferred_lft forever
      
      [root@lustre.test ~]# modinfo -F version lnet
      0.7.0
      
      [root@lustre.test ~]# modinfo -F version lustre
      2.15.2
      
      [root@lustre.test ~]# cat /etc/modprobe.d/lnet.conf
      options lnet "ip2nets=tcp(eth0) 192.168.*.3"
      
      [root@lustre.test ~]# modprobe lnet
      
      [root@lustre.test ~]# cat /sys/module/lnet/parameters/ip2nets
      tcp(eth0) 192.168.*.3
      
      [root@lustre.test ~]# lnetctl net show
      show:
          - net:
                errno: -100
                descr: "cannot get networks: Network is down"
      
      [root@lustre.test ~]# lnetctl lnet configure --all
      
      [root@lustre.test ~]# lnetctl net show
      net:
          - net type: lo
            local NI(s):
              - nid: 0@lo
                status: up
          - net type: tcp
            local NI(s):
              - nid: 192.158.19.3@tcp
                status: up
                interfaces:
                    0: eth0
      
      [root@lustre.test ~]# dmesg
      LNet: HW NUMA nodes: 2, HW CPU cores: 56, npartitions: 2
      alg: No test for adler32 (adler32-zlib)
      Key type ._llcrypt registered
      Key type .llcrypt registered
      LNet: Added LNI 192.158.19.3@tcp [8/256/0/180]
      LNet: Accept secure, port 988
      

       

      Same steps with patched lnet.ko (without IP stack setup steps):

      [root@lustre.test ~]# ip netns exec test bash
      
      [root@lustre.test ~]# ip -4 addr list
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
             valid_lft forever preferred_lft forever
      2: eth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
          inet 192.158.19.3/24 brd 192.158.19.255 scope global eth0
             valid_lft forever preferred_lft forever
      3: eth1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
          inet 192.168.3.3/24 brd 192.168.3.255 scope global eth1
             valid_lft forever preferred_lft forever
      
      [root@lustre.test ~]# cat /etc/modprobe.d/lnet.conf
      options lnet "ip2nets=tcp(eth0) 192.168.*.3"
      
      [root@lustre.test ~]# modprobe lnet
      
      [root@lustre.test ~]# cat /sys/module/lnet/parameters/ip2nets
      tcp(eth0) 192.168.*.3
      
      [root@lustre.test ~]# lnetctl net show
      show:
          - net:
                errno: -100
                descr: "cannot get networks: Network is down"
      
      [root@lustre.test ~]# lnetctl lnet configure --all
      configure:
          - lnet:
                errno: -22
                descr: "LNet configure error: Invalid argument"
      
      [root@lustre.test ~]# dmesg
      LNet: HW NUMA nodes: 2, HW CPU cores: 56, npartitions: 2
      alg: No test for adler32 (adler32-zlib)
      Key type ._llcrypt registered
      Key type .llcrypt registered
      LNetError: 11a-a: ip2nets does not match any local IP interfaces
      LNetError: 47753:0:(config.c:574:lnet_parse_networks()) networks string is undefined
      
      [root@lustre.test ~]# modprobe -r lnet ; dmesg -C
      
      [root@lustre.test ~]# modprobe lnet "ip2nets=tcp(eth0,eth10,eth1) 192.168.*.3"
      
      [root@lustre.test ~]# cat /sys/module/lnet/parameters/ip2nets
      tcp(eth0,eth10,eth1) 192.168.*.3
      
      [root@lustre.test ~]# lnetctl net show
      show:
          - net:
                errno: -100
                descr: "cannot get networks: Network is down"
      
      [root@lustre.test ~]# lnetctl lnet configure --all
      
      [root@lustre.test ~]# lnetctl net show
      net:
          - net type: lo
            local NI(s):
              - nid: 0@lo
                status: up
          - net type: tcp
            local NI(s):
              - nid: 192.168.3.3@tcp
                status: up
                interfaces:
                    0: eth1
      
      [root@lustre.test ~]# dmesg
      LNet: HW NUMA nodes: 2, HW CPU cores: 56, npartitions: 2
      alg: No test for adler32 (adler32-zlib)
      Key type ._llcrypt registered
      Key type .llcrypt registered
      LNet: ip2nets matched tcp(eth1) for 192.168.*.3
      LNet: Added LNI 192.168.3.3@tcp [8/256/0/180]
      LNet: Accept secure, port 988
      

       

      I wrote the patch against the 2.15.2 tag in hopes of making the code follow my interpretation of the documentation for the module parameter. Hopefully the area where I added the check is the correct spot.

      Attachments

        Activity

          People

            wc-triage WC Triage
            joshs Josh Samuelson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: