[LU-12824] Unable to add single Infiniband interface to multiple o2ib LNets Created: 30/Sep/19  Updated: 05/Feb/20  Resolved: 09/Oct/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.3
Fix Version/s: Lustre 2.13.0, Lustre 2.12.4

Type: Bug Priority: Critical
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Configuring a single IB interface on multiple LNets was broken by

commit 75ab841d92a7109cf9f4da69a58ae4d21d360a4c
Author: James Simmons <jsimmons@infradead.org>
Date:   Mon Jul 8 10:42:47 2019 -0700

   LU-11893 lnet: consoldate secondary IP address handling

Prior to this commit, when configuring an ib device for multiple LNets, we would only create a single struct ib_dev object. This object was created via a call to kiblnd_create_dev(). That function initializes the ib_dev object with a call to kiblnd_dev_failover(). kiblnd_dev_failover() creates the struct rdma_cm_id object, and calls rdma_bind_addr(). When the ib_dev object is created successfully, it is added to a global list of devices:

        list_add_tail(&dev->ibd_list,
                          &kiblnd_data.kib_devs);

When the interface is added to additional LNets, the kiblnd_startup() routine searches the kiblnd_data.kib_devs list to see if there is an existing ib_dev object for the interface being configured. If it finds one, then that ib_dev object is re-used.
The LU-11893 patch I noted above removed the logic for searching this list for an existing ib_dev object. It always creates a new ib_dev object, which I believe results in the EADDRINUSE.
It should be pretty straight forward to re-introduce the logic for searching the kib_devs list.

Reproducer with kernel module parameter:

[root@snx11922n002 ~]# cat /etc/lustre/ip2nets.dat
o2ib040(ib0) 10.12.0.*;
o2ib041(ib0) 10.12.0.50;
[root@snx11922n002 ~]# modprobe lnet
l[root@snx11922n002 ~]# lctl net up
LNET configure error 100: Network is down
[root@snx11922n002 ~]# dmesg | tail
[604327.506043] alg: No test for adler32 (adler32-zlib)
[604327.512517] alg: No test for crc32 (crc32-table)
[604328.280286] LNet: live_router_check_interval and dead_router_check_interval have been deprecated. Use alive_router_check_interval instead. Ignoring these deprecated parameters.
[604330.561491] LNet: 3809:0:(config.c:1641:lnet_inet_enumerate()) lnet: Ignoring interface eth2: it's down
[604330.591143] LNet: Using FastReg for registration
[604330.614353] LNet: Added LNI 10.12.0.50@o2ib40 [16/2048/0/0]
[604330.621410] LNetError: 3809:0:(o2iblnd.c:2776:kiblnd_dev_failover()) Failed to bind ib0:10.12.0.50 to device(ffff881f96ff8000): -98
[604330.636010] LNetError: 3809:0:(o2iblnd.c:3266:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -98
[604330.647163] LNetError: 105-4: Error -100 starting up LNI o2ib
[604331.659240] LNet: Removed LNI 10.12.0.50@o2ib40

Reproducer with lnetctl:

[root@snx11922n002 ~]# modprobe lnet
[root@snx11922n002 ~]# lctl mark mark
[root@snx11922n002 ~]# lnetctl lnet configure
[root@snx11922n002 ~]# lnetctl net add --net o2ib040 --if ib0
[root@snx11922n002 ~]# lnetctl net add --net o2ib041 --if ib0
add:
    - net:
          errno: -100
          descr: "cannot add network: Network is down"
[root@snx11922n002 ~]# dmesg | tail
[604760.221364] alg: No test for crc32 (crc32-table)
[604760.983433] LNet: live_router_check_interval and dead_router_check_interval have been deprecated. Use alive_router_check_interval instead. Ignoring these deprecated parameters.
[604763.557036] Lustre: DEBUG MARKER: mark
[604777.372005] LNet: 7487:0:(config.c:1641:lnet_inet_enumerate()) lnet: Ignoring interface eth2: it's down
[604777.382924] LNet: Using FastReg for registration
[604777.402400] LNet: Added LNI 10.12.0.50@o2ib40 [16/2048/0/0]
[604781.025699] LNet: 7528:0:(config.c:1641:lnet_inet_enumerate()) lnet: Ignoring interface eth2: it's down
[604781.036209] LNetError: 7528:0:(o2iblnd.c:2776:kiblnd_dev_failover()) Failed to bind ib0:10.12.0.50 to device(ffff881f96ff8000): -98
[604781.050103] LNetError: 7528:0:(o2iblnd.c:3266:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -98
[604781.060933] LNetError: 105-4: Error -100 starting up LNI o2ib
[root@snx11922n002 ~]#


 Comments   
Comment by Gerrit Updater [ 30/Sep/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36324
Subject: LU-12824 o2ib: Fix whitespace in kiblnd_startup
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4183cb54e57c9c371ec433aeda300951fe2a1aee

Comment by Gerrit Updater [ 30/Sep/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36325
Subject: LU-12824 o2ib: Record rc in debug log on startup failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d36f53f31c71774a9b1eb781829f567ad4ae0801

Comment by Gerrit Updater [ 30/Sep/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36326
Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3d4c21a773897b242c6403ae89fe37144bb4cc88

Comment by James A Simmons [ 01/Oct/19 ]

If this is really supported I don't think this has been explored for ksocklnd. Does it work there as well?

Comment by Chris Horn [ 01/Oct/19 ]

Yes, it works with ksocklnd.

sles15build01:~ # lnetctl net add --net tcp --if eth0
sles15build01:~ # lnetctl net add --net tcp1 --if eth0
sles15build01:~ # lctl list_nids
192.168.2.20@tcp
192.168.2.20@tcp1
sles15build01:~ #
Comment by James A Simmons [ 01/Oct/19 ]

If such a configuration is allowed this opens up issues about failover pairs and how health behaves in this kind of setup. I do expect their are corner cases hidden in such a setup. If this is allowed I guess IP alias support is not really needed 

Comment by Chris Horn [ 01/Oct/19 ]

I'm really surprised by your reaction to this change. Cray published the LNet fine grained routing paper at CUG 2013. We've been using this kind of config in production for 6 years.

Comment by James A Simmons [ 01/Oct/19 ]

At ORNL we implemented this differently. It just comes as a surprise that such a setup was possible. Talking to Olaf he had the same reaction. I'm not against supporting such a setup but with LNet health and fail over pairing I wonder what corner cases could exist. We should really exercise this in your LNet test suite  

I have had this happen in the past on other projects. The API is not clearly defined in some area and some company implements something no one expected. Then the change is brought before the standards board to sort it. In this case its Amir.

Comment by Amir Shehata (Inactive) [ 01/Oct/19 ]

This config has historically been supported. LNet is designed to act as a virtual network over the physical network. One use case for this configuration is to segregate LNet traffic going over the same interface.

As to regards interaction with health, since at the LNet level these are two different NIDs, their health values will be managed independently. When there is a failure to send over one of these NIDs, then their health value will be decremented and added on the recovery queue. As far as I can see it should work at the LNet level.

The draw back I see with this type of configuration is performance and security. Performance since you're sharing the same link. Security because traffic is using the same link and you can just sniff traffic on both NIDs.

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36324/
Subject: LU-12824 o2ib: Fix whitespace in kiblnd_startup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 50300e83e4cab3157149107eb735825cc4c3aff1

Comment by Gerrit Updater [ 09/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36325/
Subject: LU-12824 o2ib: Record rc in debug log on startup failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 99f85541a685df82265f18167e91c161c523ce50

Comment by Gerrit Updater [ 09/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36326/
Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e25e45c612a061031e8b4b5233137fbb57b50cc4

Comment by Peter Jones [ 09/Oct/19 ]

Landed for 2.13

Comment by Alex Parga [ 21/Oct/19 ]

Is this fix expected to land for 2.12?

Comment by Peter Jones [ 22/Oct/19 ]

I have marked it as a candidate for a future 2.12.x release.

Comment by Gerrit Updater [ 22/Oct/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36545
Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: fe6666b21f421d0fd948489ce8d30c007f5d94f1

Comment by Gerrit Updater [ 22/Oct/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36546
Subject: LU-12824 o2ib: Fix whitespace in kiblnd_startup
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 1f04b73ce39a9d181d0ba689bbaf993f348ea250

Comment by Gerrit Updater [ 22/Oct/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36547
Subject: LU-12824 o2ib: Record rc in debug log on startup failure
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 9d021ae9f819f8a15812c90af33a0604452b4bf9

Comment by Gerrit Updater [ 21/Nov/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36545/
Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: ddc3b77811f402315e390f463cff6bf517c35a8c

Comment by Gerrit Updater [ 21/Nov/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36546/
Subject: LU-12824 o2ib: Fix whitespace in kiblnd_startup
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: adfb05766dec3ae1c7fc082600be3d00db2e25e1

Comment by Gerrit Updater [ 21/Nov/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36547/
Subject: LU-12824 o2ib: Record rc in debug log on startup failure
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 5ddc9b21b975518d548474d82fae72be6832b0c2

Generated at Sat Feb 10 02:55:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.