[LU-15232] sanity-lnet : @@@@@@ FAIL: Found 2 interfaces for NID Created: 11/Nov/21  Updated: 05/May/22  Resolved: 14/Nov/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Critical
Reporter: Sergey Cheremencev Assignee: Serguei Smirnov
Resolution: Done Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-lnet session fails with a following error:

 sanity-lnet : @@@@@@ FAIL: Found 2 interfaces for NID 10.240.22.240@tcp. Expect 1 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6332:error()
  = /usr/lib64/lustre/tests/sanity-lnet.sh:266:main() 

I faced it in a few different patches:
https://testing.whamcloud.com/test_sets/730b47fe-077e-4d12-ae23-7e779c9ef774
https://testing.whamcloud.com/test_sets/26a93006-3d7b-45df-b37f-84cbd895523b
https://testing.whamcloud.com/test_sets/872224ab-3a6f-4ded-9fa2-b8d274baf5d1

 

 



 Comments   
Comment by Andreas Dilger [ 11/Nov/21 ]

This is failing 100% since yesterday, and may be related to the test hardware move?

Comment by Sergey Cheremencev [ 12/Nov/21 ]

Due to 100% failing it is impossible to get +1 from maloo.
Is it a time to increase priority?

Comment by Gerrit Updater [ 12/Nov/21 ]

"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45551
Subject: LU-15210 tests: sanity-lnet finds 2 interfaces for one nid
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 013253082e352dd3a6aeb2e0306964d200a7588b

Comment by Serguei Smirnov [ 12/Nov/21 ]

It appears that the issue is caused by the following configuration:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:84:f0:1c brd ff:ff:ff:ff:ff:ff
    inet 10.240.29.28/16 brd 10.240.255.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 10.240.29.28/20 brd 10.240.31.255 scope global dynamic noprefixroute eth0
       valid_lft 20848sec preferred_lft 20848sec
    inet6 fe80::5054:ff:fe84:f01c/64 scope link 
       valid_lft forever preferred_lft forever
3: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b2:2a:ad:c2:d1:13 brd ff:ff:ff:ff:ff:ff link-netns test_ns
    inet6 fe80::b02a:adff:fec2:d113/64 scope link 
       valid_lft forever preferred_lft forever 

Need to modify sanity-lnet to be able to handle this.

Comment by Gerrit Updater [ 14/Nov/21 ]

"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/45551/
Subject: LU-15210 tests: fix sanity-lnet to handle duplicate IP
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a50eaae974ee04364c9fbbb4625dd3d581a8c986

Comment by Andreas Dilger [ 14/Nov/21 ]

Patch landed for 2.15, also needs to land for other branches.

Comment by Andreas Dilger [ 15/Nov/21 ]

NB: this was originally LU-15210, but was temporarily moved to a different project and moved back, so has a new LU-15232 ticket number.

Comment by Alexey Lyashkov [ 23/Nov/21 ]

Andreas, it looks our's node config issue. First address from static config, second address from the dhcp client.
So simplest way - fix own configuration instead of lustre test-framework fix.

Generated at Sat Feb 10 03:16:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.