[LU-6333] Router's network status turns to "down" if it has different "accecpt_port" with client/server Created: 05/Mar/15 Updated: 06/Mar/15 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
lustre-b2_7 build # 29 |
||
| Severity: | 3 |
| Rank (Obsolete): | 17733 |
| Description |
|
Router's network status turns to "down" if it has different "accecpt_port" with client(or server) Here is the test steps: on router: [root@eagle-54vm5 tests]# modprobe lnet
LNet: HW CPU cores: 1, npartitions: 1
alg: No test for adler32 (adler32-zlib)
alg: No test for crc32 (crc32-table)
[root@eagle-54vm5 tests]# lctl network up
LNet: Added LNI 192.168.200.82@tcp [8/256/0/180]
LNet: Added LNI 192.168.201.101@tcp1 [8/256/0/180]
LNet: Added LNI 192.168.202.116@tcp2 [8/256/0/180]
LNet: Accept secure, port 7988
LNET configured
[root@eagle-54vm5 tests]# lnetctl net show
net:
- net: lo
nid: 0@lo
status: up
- net: tcp
nid: 192.168.200.82@tcp
status: up
interfaces:
0: eth1
- net: tcp1
nid: 192.168.201.101@tcp1
status: up
interfaces:
0: eth2
- net: tcp2
nid: 192.168.202.116@tcp2
status: up
interfaces:
0: eth3
[root@eagle-54vm5 tests]# lnetctl net show
net:
- net: lo
nid: 0@lo
status: up
- net: tcp
nid: 192.168.200.82@tcp
status: up
interfaces:
0: eth1
- net: tcp1
nid: 192.168.201.101@tcp1
status: down
interfaces:
0: eth2
- net: tcp2
nid: 192.168.202.116@tcp2
status: down
interfaces:
0: eth3
[root@eagle-54vm5 tests]#
on client: [root@eagle-54vm3 tests]# more /etc/modprobe.d/lustre-lnet-client1.conf
alias eth1 e1000e
alias scsi_hostadapter ahci
alias eth0 e1000e
alias ib0 ib_ipoib
#options lnet accept=all networks="tcp1(eth1)" accept_port=7988 routes="tcp0 192
.168.201.101@tcp1" config_on_load=1
options lnet accept=all networks="tcp1(eth1)" routes="tcp0 192.168.201.101@tcp1" config_on_load=1
[root@eagle-54vm3 tests]# modprobe lnet
LNet: HW CPU cores: 1, npartitions: 1
alg: No test for adler32 (adler32-zlib)
alg: No test for crc32 (crc32-table)
[root@eagle-54vm3 tests]#
[root@eagle-54vm3 tests]# LNet: Added LNI 192.168.201.180@tcp1 [8/256/0/180]
LNet: Accept all, port 988
[root@eagle-54vm3 tests]# lnetctl net show
net:
- net: lo
nid: 0@lo
status: up
- net: tcp1
nid: 192.168.201.180@tcp1
status: up
interfaces:
0: eth1
[root@eagle-54vm3 tests]# lnetctl route show
route:
- net: tcp
gateway: 192.168.201.101@tcp1
[root@eagle-54vm3 tests]# lnetctl route show -v
route:
- net: tcp
gateway: 192.168.201.101@tcp1
hop: 1
priority: 0
state: down
[root@eagle-54vm3 tests]# lctl ping 192.168.201.101@tcp1
failed to ping 192.168.201.101@tcp1: Input/output error
|
| Comments |
| Comment by Isaac Huang (Inactive) [ 05/Mar/15 ] |
|
I don't think this is a bug. LNet doesn't negotiate accept_port settings over the wire. It's the admin's responsibility to set it up consistently. It's impossible to negotiate it without using 3rd party services at well-known port. |
| Comment by Sarah Liu [ 05/Mar/15 ] |
|
I understand that the admin should keep the port consistently, but if adding a new client into an existing network with unmatched port will cause the remote router down, it doesn't make sense to me. |
| Comment by Isaac Huang (Inactive) [ 06/Mar/15 ] |
|
If that is the case, it does not make any sense. But are you sure it was the client with unmatched accept_port that caused the router to mark its interfaces down? In other words, the router interfaces would stay in "up" state as long as there's no client with unmatched accept_port? |