Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15232

sanity-lnet : @@@@@@ FAIL: Found 2 interfaces for NID

Details

    • Bug
    • Resolution: Done
    • Critical
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      sanity-lnet session fails with a following error:

       sanity-lnet : @@@@@@ FAIL: Found 2 interfaces for NID 10.240.22.240@tcp. Expect 1 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6332:error()
        = /usr/lib64/lustre/tests/sanity-lnet.sh:266:main() 

      I faced it in a few different patches:
      https://testing.whamcloud.com/test_sets/730b47fe-077e-4d12-ae23-7e779c9ef774
      https://testing.whamcloud.com/test_sets/26a93006-3d7b-45df-b37f-84cbd895523b
      https://testing.whamcloud.com/test_sets/872224ab-3a6f-4ded-9fa2-b8d274baf5d1

       

       

      Attachments

        Activity

          [LU-15232] sanity-lnet : @@@@@@ FAIL: Found 2 interfaces for NID

          Andreas, it looks our's node config issue. First address from static config, second address from the dhcp client.
          So simplest way - fix own configuration instead of lustre test-framework fix.

          shadow Alexey Lyashkov added a comment - Andreas, it looks our's node config issue. First address from static config, second address from the dhcp client. So simplest way - fix own configuration instead of lustre test-framework fix.

          NB: this was originally LU-15210, but was temporarily moved to a different project and moved back, so has a new LU-15232 ticket number.

          adilger Andreas Dilger added a comment - NB: this was originally LU-15210 , but was temporarily moved to a different project and moved back, so has a new LU-15232 ticket number.

          Patch landed for 2.15, also needs to land for other branches.

          adilger Andreas Dilger added a comment - Patch landed for 2.15, also needs to land for other branches.

          "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/45551/
          Subject: LU-15210 tests: fix sanity-lnet to handle duplicate IP
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: a50eaae974ee04364c9fbbb4625dd3d581a8c986

          gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/45551/ Subject: LU-15210 tests: fix sanity-lnet to handle duplicate IP Project: fs/lustre-release Branch: master Current Patch Set: Commit: a50eaae974ee04364c9fbbb4625dd3d581a8c986

          It appears that the issue is caused by the following configuration:

          1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
              link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
              inet 127.0.0.1/8 scope host lo
                 valid_lft forever preferred_lft forever
              inet6 ::1/128 scope host 
                 valid_lft forever preferred_lft forever
          2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
              link/ether 52:54:00:84:f0:1c brd ff:ff:ff:ff:ff:ff
              inet 10.240.29.28/16 brd 10.240.255.255 scope global noprefixroute eth0
                 valid_lft forever preferred_lft forever
              inet 10.240.29.28/20 brd 10.240.31.255 scope global dynamic noprefixroute eth0
                 valid_lft 20848sec preferred_lft 20848sec
              inet6 fe80::5054:ff:fe84:f01c/64 scope link 
                 valid_lft forever preferred_lft forever
          3: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
              link/ether b2:2a:ad:c2:d1:13 brd ff:ff:ff:ff:ff:ff link-netns test_ns
              inet6 fe80::b02a:adff:fec2:d113/64 scope link 
                 valid_lft forever preferred_lft forever 

          Need to modify sanity-lnet to be able to handle this.

          ssmirnov Serguei Smirnov added a comment - It appears that the issue is caused by the following configuration: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:84:f0:1c brd ff:ff:ff:ff:ff:ff inet 10.240.29.28/16 brd 10.240.255.255 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet 10.240.29.28/20 brd 10.240.31.255 scope global dynamic noprefixroute eth0 valid_lft 20848sec preferred_lft 20848sec inet6 fe80::5054:ff:fe84:f01c/64 scope link valid_lft forever preferred_lft forever 3: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether b2:2a:ad:c2:d1:13 brd ff:ff:ff:ff:ff:ff link-netns test_ns inet6 fe80::b02a:adff:fec2:d113/64 scope link valid_lft forever preferred_lft forever Need to modify sanity-lnet to be able to handle this.

          "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45551
          Subject: LU-15210 tests: sanity-lnet finds 2 interfaces for one nid
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 013253082e352dd3a6aeb2e0306964d200a7588b

          gerrit Gerrit Updater added a comment - "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45551 Subject: LU-15210 tests: sanity-lnet finds 2 interfaces for one nid Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 013253082e352dd3a6aeb2e0306964d200a7588b

          Due to 100% failing it is impossible to get +1 from maloo.
          Is it a time to increase priority?

          scherementsev Sergey Cheremencev added a comment - Due to 100% failing it is impossible to get +1 from maloo. Is it a time to increase priority?

          This is failing 100% since yesterday, and may be related to the test hardware move?

          adilger Andreas Dilger added a comment - This is failing 100% since yesterday, and may be related to the test hardware move?

          People

            ssmirnov Serguei Smirnov
            scherementsev Sergey Cheremencev
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: