Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12824

Unable to add single Infiniband interface to multiple o2ib LNets

Details

    • 3
    • 9223372036854775807

    Description

      Configuring a single IB interface on multiple LNets was broken by

      commit 75ab841d92a7109cf9f4da69a58ae4d21d360a4c
      Author: James Simmons <jsimmons@infradead.org>
      Date:   Mon Jul 8 10:42:47 2019 -0700
      
         LU-11893 lnet: consoldate secondary IP address handling
      

      Prior to this commit, when configuring an ib device for multiple LNets, we would only create a single struct ib_dev object. This object was created via a call to kiblnd_create_dev(). That function initializes the ib_dev object with a call to kiblnd_dev_failover(). kiblnd_dev_failover() creates the struct rdma_cm_id object, and calls rdma_bind_addr(). When the ib_dev object is created successfully, it is added to a global list of devices:

              list_add_tail(&dev->ibd_list,
                                &kiblnd_data.kib_devs);
      

      When the interface is added to additional LNets, the kiblnd_startup() routine searches the kiblnd_data.kib_devs list to see if there is an existing ib_dev object for the interface being configured. If it finds one, then that ib_dev object is re-used.
      The LU-11893 patch I noted above removed the logic for searching this list for an existing ib_dev object. It always creates a new ib_dev object, which I believe results in the EADDRINUSE.
      It should be pretty straight forward to re-introduce the logic for searching the kib_devs list.

      Reproducer with kernel module parameter:

      [root@snx11922n002 ~]# cat /etc/lustre/ip2nets.dat
      o2ib040(ib0) 10.12.0.*;
      o2ib041(ib0) 10.12.0.50;
      [root@snx11922n002 ~]# modprobe lnet
      l[root@snx11922n002 ~]# lctl net up
      LNET configure error 100: Network is down
      [root@snx11922n002 ~]# dmesg | tail
      [604327.506043] alg: No test for adler32 (adler32-zlib)
      [604327.512517] alg: No test for crc32 (crc32-table)
      [604328.280286] LNet: live_router_check_interval and dead_router_check_interval have been deprecated. Use alive_router_check_interval instead. Ignoring these deprecated parameters.
      [604330.561491] LNet: 3809:0:(config.c:1641:lnet_inet_enumerate()) lnet: Ignoring interface eth2: it's down
      [604330.591143] LNet: Using FastReg for registration
      [604330.614353] LNet: Added LNI 10.12.0.50@o2ib40 [16/2048/0/0]
      [604330.621410] LNetError: 3809:0:(o2iblnd.c:2776:kiblnd_dev_failover()) Failed to bind ib0:10.12.0.50 to device(ffff881f96ff8000): -98
      [604330.636010] LNetError: 3809:0:(o2iblnd.c:3266:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -98
      [604330.647163] LNetError: 105-4: Error -100 starting up LNI o2ib
      [604331.659240] LNet: Removed LNI 10.12.0.50@o2ib40
      

      Reproducer with lnetctl:

      [root@snx11922n002 ~]# modprobe lnet
      [root@snx11922n002 ~]# lctl mark mark
      [root@snx11922n002 ~]# lnetctl lnet configure
      [root@snx11922n002 ~]# lnetctl net add --net o2ib040 --if ib0
      [root@snx11922n002 ~]# lnetctl net add --net o2ib041 --if ib0
      add:
          - net:
                errno: -100
                descr: "cannot add network: Network is down"
      [root@snx11922n002 ~]# dmesg | tail
      [604760.221364] alg: No test for crc32 (crc32-table)
      [604760.983433] LNet: live_router_check_interval and dead_router_check_interval have been deprecated. Use alive_router_check_interval instead. Ignoring these deprecated parameters.
      [604763.557036] Lustre: DEBUG MARKER: mark
      [604777.372005] LNet: 7487:0:(config.c:1641:lnet_inet_enumerate()) lnet: Ignoring interface eth2: it's down
      [604777.382924] LNet: Using FastReg for registration
      [604777.402400] LNet: Added LNI 10.12.0.50@o2ib40 [16/2048/0/0]
      [604781.025699] LNet: 7528:0:(config.c:1641:lnet_inet_enumerate()) lnet: Ignoring interface eth2: it's down
      [604781.036209] LNetError: 7528:0:(o2iblnd.c:2776:kiblnd_dev_failover()) Failed to bind ib0:10.12.0.50 to device(ffff881f96ff8000): -98
      [604781.050103] LNetError: 7528:0:(o2iblnd.c:3266:kiblnd_startup()) ko2iblnd: Can't initialize device: rc = -98
      [604781.060933] LNetError: 105-4: Error -100 starting up LNI o2ib
      [root@snx11922n002 ~]#
      

      Attachments

        Activity

          [LU-12824] Unable to add single Infiniband interface to multiple o2ib LNets

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36547/
          Subject: LU-12824 o2ib: Record rc in debug log on startup failure
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: 5ddc9b21b975518d548474d82fae72be6832b0c2

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36547/ Subject: LU-12824 o2ib: Record rc in debug log on startup failure Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 5ddc9b21b975518d548474d82fae72be6832b0c2

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36546/
          Subject: LU-12824 o2ib: Fix whitespace in kiblnd_startup
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: adfb05766dec3ae1c7fc082600be3d00db2e25e1

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36546/ Subject: LU-12824 o2ib: Fix whitespace in kiblnd_startup Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: adfb05766dec3ae1c7fc082600be3d00db2e25e1

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36545/
          Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: ddc3b77811f402315e390f463cff6bf517c35a8c

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36545/ Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: ddc3b77811f402315e390f463cff6bf517c35a8c

          Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36547
          Subject: LU-12824 o2ib: Record rc in debug log on startup failure
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: 9d021ae9f819f8a15812c90af33a0604452b4bf9

          gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36547 Subject: LU-12824 o2ib: Record rc in debug log on startup failure Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 9d021ae9f819f8a15812c90af33a0604452b4bf9

          Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36546
          Subject: LU-12824 o2ib: Fix whitespace in kiblnd_startup
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: 1f04b73ce39a9d181d0ba689bbaf993f348ea250

          gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36546 Subject: LU-12824 o2ib: Fix whitespace in kiblnd_startup Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 1f04b73ce39a9d181d0ba689bbaf993f348ea250

          Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36545
          Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: fe6666b21f421d0fd948489ce8d30c007f5d94f1

          gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36545 Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: fe6666b21f421d0fd948489ce8d30c007f5d94f1
          pjones Peter Jones added a comment -

          I have marked it as a candidate for a future 2.12.x release.

          pjones Peter Jones added a comment - I have marked it as a candidate for a future 2.12.x release.
          apargal Alex Parga added a comment -

          Is this fix expected to land for 2.12?

          apargal Alex Parga added a comment - Is this fix expected to land for 2.12?
          pjones Peter Jones added a comment -

          Landed for 2.13

          pjones Peter Jones added a comment - Landed for 2.13

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36326/
          Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: e25e45c612a061031e8b4b5233137fbb57b50cc4

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36326/ Subject: LU-12824 o2ib: Reintroduce kiblnd_dev_search Project: fs/lustre-release Branch: master Current Patch Set: Commit: e25e45c612a061031e8b4b5233137fbb57b50cc4

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: