[LU-6333] Router's network status turns to "down" if it has different "accecpt_port" with client/server Created: 05/Mar/15  Updated: 06/Mar/15

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

lustre-b2_7 build # 29


Severity: 3
Rank (Obsolete): 17733

 Description   

Router's network status turns to "down" if it has different "accecpt_port" with client(or server)

Here is the test steps:
1. successfully setup router with 3 different interfaces, accept_port=7988 ,the status shows the networks are all up;
2. load lnet on client which has the default accept_port 988, also has router mentioned in the configure file
3. on client side, do lctl ping, then check the router network status and found it was down.

on router:

[root@eagle-54vm5 tests]# modprobe lnet
LNet: HW CPU cores: 1, npartitions: 1
alg: No test for adler32 (adler32-zlib)
alg: No test for crc32 (crc32-table)
[root@eagle-54vm5 tests]# lctl network up
LNet: Added LNI 192.168.200.82@tcp [8/256/0/180]
LNet: Added LNI 192.168.201.101@tcp1 [8/256/0/180]
LNet: Added LNI 192.168.202.116@tcp2 [8/256/0/180]
LNet: Accept secure, port 7988
LNET configured
[root@eagle-54vm5 tests]# lnetctl net show
net:
    - net: lo
      nid: 0@lo
      status: up
    - net: tcp
      nid: 192.168.200.82@tcp
      status: up
      interfaces:
          0: eth1
    - net: tcp1
      nid: 192.168.201.101@tcp1
      status: up
      interfaces:
          0: eth2
    - net: tcp2
      nid: 192.168.202.116@tcp2
      status: up
      interfaces:
          0: eth3
[root@eagle-54vm5 tests]# lnetctl net show
net:
    - net: lo
      nid: 0@lo
      status: up
    - net: tcp
      nid: 192.168.200.82@tcp
      status: up
      interfaces:
          0: eth1
    - net: tcp1
      nid: 192.168.201.101@tcp1
      status: down
      interfaces:
          0: eth2
    - net: tcp2
      nid: 192.168.202.116@tcp2
      status: down
      interfaces:
          0: eth3
[root@eagle-54vm5 tests]# 

on client:

[root@eagle-54vm3 tests]# more /etc/modprobe.d/lustre-lnet-client1.conf 
alias eth1 e1000e
alias scsi_hostadapter ahci
alias eth0 e1000e
alias ib0 ib_ipoib

#options lnet accept=all networks="tcp1(eth1)" accept_port=7988 routes="tcp0 192
.168.201.101@tcp1" config_on_load=1
options lnet accept=all networks="tcp1(eth1)" routes="tcp0 192.168.201.101@tcp1" config_on_load=1

[root@eagle-54vm3 tests]# modprobe lnet
LNet: HW CPU cores: 1, npartitions: 1
alg: No test for adler32 (adler32-zlib)
alg: No test for crc32 (crc32-table)

[root@eagle-54vm3 tests]# 
[root@eagle-54vm3 tests]# LNet: Added LNI 192.168.201.180@tcp1 [8/256/0/180]
LNet: Accept all, port 988
[root@eagle-54vm3 tests]# lnetctl net show
net:
    - net: lo
      nid: 0@lo
      status: up
    - net: tcp1
      nid: 192.168.201.180@tcp1
      status: up
      interfaces:
          0: eth1
[root@eagle-54vm3 tests]# lnetctl route show
route:
    - net: tcp
      gateway: 192.168.201.101@tcp1
[root@eagle-54vm3 tests]# lnetctl route show -v
route:
    - net: tcp
      gateway: 192.168.201.101@tcp1
      hop: 1
      priority: 0
      state: down
[root@eagle-54vm3 tests]# lctl ping 192.168.201.101@tcp1
failed to ping 192.168.201.101@tcp1: Input/output error


 Comments   
Comment by Isaac Huang (Inactive) [ 05/Mar/15 ]

I don't think this is a bug. LNet doesn't negotiate accept_port settings over the wire. It's the admin's responsibility to set it up consistently. It's impossible to negotiate it without using 3rd party services at well-known port.

Comment by Sarah Liu [ 05/Mar/15 ]

I understand that the admin should keep the port consistently, but if adding a new client into an existing network with unmatched port will cause the remote router down, it doesn't make sense to me.

Comment by Isaac Huang (Inactive) [ 06/Mar/15 ]

If that is the case, it does not make any sense. But are you sure it was the client with unmatched accept_port that caused the router to mark its interfaces down? In other words, the router interfaces would stay in "up" state as long as there's no client with unmatched accept_port?

Generated at Sat Feb 10 01:59:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.