[LU-11460] lnetctl ping giving unexpected output Created: 02/Oct/18 Updated: 04/Oct/18 Resolved: 04/Oct/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Sonia Sharma (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Node A [root@trevis-402 ~]# lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.9.10.2@tcp status: up interfaces: 0: eth0 [root@trevis-402 ~]# lnetctl peer show peer: - primary nid: 10.9.11.3@tcp Multi-Rail: True peer ni: - nid: 10.9.11.3@tcp state: NA - nid: 10.9.10.3@tcp state: NA - primary nid: 10.9.10.2@tcp Multi-Rail: True peer ni: - nid: 10.9.10.2@tcp state: NA
Node B [root@trevis-403 ~]# lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.9.10.3@tcp status: up interfaces: 0: eth0 - nid: 10.9.11.3@tcp status: up interfaces: 0: eth1
Now delete the 'eth1' interface on Node B, so configuration looks like this - [root@trevis-403 ~]# lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.9.10.3@tcp status: up interfaces: 0: eth0
After this step, if we try to ping from Node A to Node B, [root@trevis-402 ~]# lnetctl ping 10.9.11.3@tcp manage: - ping: errno: -1 descr: failed to ping 10.9.11.3@tcp: Input/output error [root@trevis-402 ~]# lnetctl ping 10.9.11.3@tcp ping: - primary nid: 10.9.11.3@tcp Multi-Rail: True peer ni: - nid: 10.9.10.3@tcp [root@trevis-402 ~]# lnetctl ping 10.9.11.3@tcp manage: - ping: errno: -1 descr: failed to ping 10.9.11.3@tcp: Input/output error [root@trevis-402 ~]# lnetctl ping 10.9.11.3@tcp ping: - primary nid: 10.9.11.3@tcp Multi-Rail: True peer ni: - nid: 10.9.10.3@tcp |
| Comments |
| Comment by Amir Shehata (Inactive) [ 03/Oct/18 ] |
|
I think the issue is in lnet_peer_del_nid() 485 »·······/* 486 »······· * This function only allows deletion of the primary NID if it 487 »······· * is the only NID. 488 »······· */ 489 »·······if (nid == lp->lp_primary_nid && lp->lp_nnis != 1) { 490 »·······»·······rc = -EBUSY; 491 »·······»·······goto out; 492 »·······} |
| Comment by Sonia Sharma (Inactive) [ 04/Oct/18 ] |
|
With discovery, updated peer information is sent via PUSH message but if the interface that is brought down is the primary_nid then the peer information cannot be updated. This is because the primary_nid cannot be deleted/updated as the upper layers rely on the intially configured primary_nid |