[LU-11460] lnetctl ping giving unexpected output Created: 02/Oct/18  Updated: 04/Oct/18  Resolved: 04/Oct/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sonia Sharma (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: HTML File errorlog    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Node A

[root@trevis-402 ~]# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 10.9.10.2@tcp
          status: up
          interfaces:
              0: eth0

[root@trevis-402 ~]# lnetctl peer show
peer:
    - primary nid: 10.9.11.3@tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.9.11.3@tcp
          state: NA
        - nid: 10.9.10.3@tcp
          state: NA
    - primary nid: 10.9.10.2@tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.9.10.2@tcp
          state: NA

 

Node B

[root@trevis-403 ~]# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 10.9.10.3@tcp
          status: up
          interfaces:
              0: eth0
        - nid: 10.9.11.3@tcp
          status: up
          interfaces:
              0: eth1

 

Now delete the 'eth1' interface on Node B, so configuration looks like this -

[root@trevis-403 ~]# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 10.9.10.3@tcp
          status: up
          interfaces:
              0: eth0

 

After this step, if we try to ping from Node A to Node B,

[root@trevis-402 ~]# lnetctl ping 10.9.11.3@tcp
manage:
    - ping:
          errno: -1
          descr: failed to ping 10.9.11.3@tcp: Input/output error
                 
[root@trevis-402 ~]# lnetctl ping 10.9.11.3@tcp
ping:
    - primary nid: 10.9.11.3@tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.9.10.3@tcp
[root@trevis-402 ~]# lnetctl ping 10.9.11.3@tcp
manage:
    - ping:
          errno: -1
          descr: failed to ping 10.9.11.3@tcp: Input/output error
                 
[root@trevis-402 ~]# lnetctl ping 10.9.11.3@tcp
ping:
    - primary nid: 10.9.11.3@tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.9.10.3@tcp


 Comments   
Comment by Amir Shehata (Inactive) [ 03/Oct/18 ]

I think the issue is in

lnet_peer_del_nid()

485 »·······/*
486 »······· * This function only allows deletion of the primary NID if it
 487 »······· * is the only NID.
 488 »······· */
 489 »·······if (nid == lp->lp_primary_nid && lp->lp_nnis != 1) {
 490 »·······»·······rc = -EBUSY;
 491 »·······»·······goto out;
 492 »·······}

Comment by Sonia Sharma (Inactive) [ 04/Oct/18 ]

With discovery, updated peer information is sent via PUSH message but if the interface that is brought down is the primary_nid then the peer information cannot be updated. This is because the primary_nid cannot be deleted/updated as the upper layers rely on the intially configured primary_nid

Generated at Sat Feb 10 02:44:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.