[LU-5875] DLC: failed adding an existing network interface when there is traffic ongoing Created: 05/Nov/14  Updated: 19/Jan/15  Resolved: 19/Jan/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Major
Reporter: Sarah Liu Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-2456 Dynamic LNet Config Main Development ... Resolved
is related to LU-5874 DLC: the ongoing traffic was interrup... Resolved
Severity: 3
Rank (Obsolete): 16427

 Description   

1. setup the system and run sanity
2. add an existing network interface on the client side, got error messages

[root@onyx-28 ~]# lnetctl net show -v
net:
    - nid: 0@lo
      status: up
      tunables:
          peer_timeout: 0
          peer_credits: 0
          peer_buffer_credits: 0
          credits: 0
          CPTs:  0
    - nid: 10.2.4.74@tcp
      status: up
      interfaces:
          0: eth0
      tunables:
          peer_timeout: 180
          peer_credits: 8
          peer_buffer_credits: 0
          credits: 256
          CPTs:  0

[root@onyx-28 ~]# lnetctl net add --net tcp --if eth0
add:
    - net:
          errno: -22
          descr: "cannot add network: Invalid argument"
Lustre: DEBUG MARKER: == sanity test 24v: list directory with large files (handle hash collision, bug: 17560) == 12:10:03 (1415218203)
LNetError: 31891:0:(api-ni.c:1488:lnet_startup_lndnis()) Net tcp is not unique
LNetError: 31897:0:(api-ni.c:1488:lnet_startup_lndnis()) Net tcp is not unique
LNetError: 31899:0:(api-ni.c:1488:lnet_startup_lndnis()) Net tcp is not unique

 - created 10000 (time 141521821Lustre: DEBUG MARKER: cancel_lru_locks mdc start
3.63 total 10.13 last 10.13)
 - created 20000 (time 1415218224.04 total 20.54 last 10.41)


 Comments   
Comment by Jodi Levi (Inactive) [ 06/Nov/14 ]

Amir,
Can you please have a look at this one?
Thank you!

Comment by Amir Shehata (Inactive) [ 06/Nov/14 ]

You can not re-add add an existing network. As reported in the errors: the failure to add is due to the network not being unique.

This is not a bug.

Comment by Sarah Liu [ 08/Nov/14 ]

Then could you please update the test plan?
"Test Case Name dynLNet.system.net_existing"

Comment by Andreas Dilger [ 02/Dec/14 ]

Amir, it would be good to fix the error to return -EEXIST in this case, not -EINVAL, so that it prints out a more useful error message for the user.

Sarah, can you please re-run this test with a different network interface to verify that this is working correctly. This functionality is the whole reason for DLC so it should work.

Comment by Sarah Liu [ 02/Dec/14 ]

Andreas,

Adding a different network interface needs at least two test nodes configured with 3 interfaces, current all Onyx nodes only have 2(tcp0 and ib0) hooked, I have opened TEI-2972 to request some test nodes configured with 3 interfaces.

Comment by Gerrit Updater [ 12/Dec/14 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/13056
Subject: LU-5875 lnet: return -EEXIST if NI is not unique
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e2606de22e31866de70df0f1b1c8178aef3ff49f

Comment by Gerrit Updater [ 19/Jan/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13056/
Subject: LU-5875 lnet: return -EEXIST if NI is not unique
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7d63c00e24d77d931642ce6cf5e8ff4cc2cad255

Comment by Peter Jones [ 19/Jan/15 ]

Landed for 2.7

Generated at Sat Feb 10 01:55:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.