Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17791

ko2iblnd can't be loaded in 5.15.0-105-generic kernel

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      lustre ko2iblnd modules can't be compiled against 5.15.0-105-generic and ofed-23.10-1.1.9, but they are not able to be loaded properly due to symbol mismatch errors.

      # git log --oneline -1
      2333c8e3ae (HEAD -> master, origin/master, origin/HEAD) LU-17744 ldiskfs: mballoc stats fixes
      
      # uname  -r
      5.15.0-105-generic
      
      # ofed_info -n
      23.10-1.1.9
      
      # git clean -d -x -f; sh ./autogen.sh ; ./configure --with-linux=/usr/src/linux-headers-5.15.0-105-generic --with-o2ib=/usr/src/ofa_kernel/x86_64/5.15.0-105-generic; make debs
      
      # modprobe lustre
      modprobe: ERROR: could not insert 'lustre': Invalid argument
      
      Apr 30 10:03:30 ggpu02 kernel: [422298.241699] libcfs: HW NUMA nodes: 1, HW CPU cores: 96, npartitions: 24
      Apr 30 10:03:30 ggpu02 kernel: [422298.254189] Lustre: Lustre: Build Version: 2.15.62_58_g2333c8e
      Apr 30 10:03:30 ggpu02 kernel: [422298.281629] ko2iblnd: disagrees about version of symbol __ib_alloc_pd
      Apr 30 10:03:30 ggpu02 kernel: [422298.281632] ko2iblnd: Unknown symbol __ib_alloc_pd (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281649] ko2iblnd: disagrees about version of symbol rdma_resolve_addr
      Apr 30 10:03:30 ggpu02 kernel: [422298.281650] ko2iblnd: Unknown symbol rdma_resolve_addr (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281656] ko2iblnd: disagrees about version of symbol rdma_set_service_type
      Apr 30 10:03:30 ggpu02 kernel: [422298.281656] ko2iblnd: Unknown symbol rdma_set_service_type (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281678] ko2iblnd: disagrees about version of symbol ib_dereg_mr_user
      Apr 30 10:03:30 ggpu02 kernel: [422298.281678] ko2iblnd: Unknown symbol ib_dereg_mr_user (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281687] ko2iblnd: disagrees about version of symbol rdma_reject
      Apr 30 10:03:30 ggpu02 kernel: [422298.281687] ko2iblnd: Unknown symbol rdma_reject (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281696] ko2iblnd: disagrees about version of symbol rdma_disconnect
      Apr 30 10:03:30 ggpu02 kernel: [422298.281696] ko2iblnd: Unknown symbol rdma_disconnect (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281723] ko2iblnd: disagrees about version of symbol __rdma_create_kernel_id
      Apr 30 10:03:30 ggpu02 kernel: [422298.281724] ko2iblnd: Unknown symbol __rdma_create_kernel_id (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281729] ko2iblnd: disagrees about version of symbol ib_register_event_handler
      Apr 30 10:03:30 ggpu02 kernel: [422298.281729] ko2iblnd: Unknown symbol ib_register_event_handler (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281746] ko2iblnd: disagrees about version of symbol rdma_resolve_route
      Apr 30 10:03:30 ggpu02 kernel: [422298.281746] ko2iblnd: Unknown symbol rdma_resolve_route (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281750] ko2iblnd: disagrees about version of symbol ib_unregister_event_handler
      Apr 30 10:03:30 ggpu02 kernel: [422298.281751] ko2iblnd: Unknown symbol ib_unregister_event_handler (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281758] ko2iblnd: disagrees about version of symbol rdma_bind_addr
      Apr 30 10:03:30 ggpu02 kernel: [422298.281758] ko2iblnd: Unknown symbol rdma_bind_addr (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281766] ko2iblnd: disagrees about version of symbol rdma_create_qp
      Apr 30 10:03:30 ggpu02 kernel: [422298.281767] ko2iblnd: Unknown symbol rdma_create_qp (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281773] ko2iblnd: disagrees about version of symbol ib_map_mr_sg
      Apr 30 10:03:30 ggpu02 kernel: [422298.281774] ko2iblnd: Unknown symbol ib_map_mr_sg (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281783] ko2iblnd: disagrees about version of symbol ib_query_port
      Apr 30 10:03:30 ggpu02 kernel: [422298.281783] ko2iblnd: Unknown symbol ib_query_port (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281785] ko2iblnd: disagrees about version of symbol rdma_notify
      Apr 30 10:03:30 ggpu02 kernel: [422298.281785] ko2iblnd: Unknown symbol rdma_notify (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281791] ko2iblnd: disagrees about version of symbol rdma_listen
      Apr 30 10:03:30 ggpu02 kernel: [422298.281792] ko2iblnd: Unknown symbol rdma_listen (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281793] ko2iblnd: disagrees about version of symbol rdma_destroy_qp
      Apr 30 10:03:30 ggpu02 kernel: [422298.281794] ko2iblnd: Unknown symbol rdma_destroy_qp (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281798] ko2iblnd: disagrees about version of symbol __ib_create_cq
      Apr 30 10:03:30 ggpu02 kernel: [422298.281798] ko2iblnd: Unknown symbol __ib_create_cq (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281805] ko2iblnd: disagrees about version of symbol ib_alloc_mr
      Apr 30 10:03:30 ggpu02 kernel: [422298.281806] ko2iblnd: Unknown symbol ib_alloc_mr (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281810] ko2iblnd: disagrees about version of symbol rdma_connect_locked
      Apr 30 10:03:30 ggpu02 kernel: [422298.281811] ko2iblnd: Unknown symbol rdma_connect_locked (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281812] ko2iblnd: disagrees about version of symbol rdma_set_reuseaddr
      Apr 30 10:03:30 ggpu02 kernel: [422298.281813] ko2iblnd: Unknown symbol rdma_set_reuseaddr (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281815] ko2iblnd: disagrees about version of symbol ib_destroy_cq_user
      Apr 30 10:03:30 ggpu02 kernel: [422298.281816] ko2iblnd: Unknown symbol ib_destroy_cq_user (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281821] ko2iblnd: disagrees about version of symbol ib_modify_qp
      Apr 30 10:03:30 ggpu02 kernel: [422298.281822] ko2iblnd: Unknown symbol ib_modify_qp (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281835] ko2iblnd: disagrees about version of symbol ib_dma_virt_map_sg
      Apr 30 10:03:30 ggpu02 kernel: [422298.281835] ko2iblnd: Unknown symbol ib_dma_virt_map_sg (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281837] ko2iblnd: disagrees about version of symbol rdma_destroy_id
      Apr 30 10:03:30 ggpu02 kernel: [422298.281838] ko2iblnd: Unknown symbol rdma_destroy_id (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281843] ko2iblnd: disagrees about version of symbol rdma_accept
      Apr 30 10:03:30 ggpu02 kernel: [422298.281844] ko2iblnd: Unknown symbol rdma_accept (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281851] ko2iblnd: disagrees about version of symbol ib_dealloc_pd_user
      Apr 30 10:03:30 ggpu02 kernel: [422298.281852] ko2iblnd: Unknown symbol ib_dealloc_pd_user (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.314513] LNetError: 2113531:0:(api-ni.c:2609:lnet_load_lnd()) Can't load LND o2ib, module ko2iblnd, rc=256
      Apr 30 10:03:30 ggpu02 kernel: [422298.325808] LustreError: 2113531:0:(events.c:640:ptlrpc_init_portals()) network initialisation failed: rc = -22
      

      Attachments

        Activity

          [LU-17791] ko2iblnd can't be loaded in 5.15.0-105-generic kernel
          pjones Peter Jones added a comment -

          Merged for 2.16

          pjones Peter Jones added a comment - Merged for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54984/
          Subject: LU-17791 build: use external o2ib path for ko2iblnd.ko
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 2a107b4d32ce6973905aaab596028b7368b094b1

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54984/ Subject: LU-17791 build: use external o2ib path for ko2iblnd.ko Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2a107b4d32ce6973905aaab596028b7368b094b1

          Sorry, it took a bit to figure out what was missed.

          Looks like the build is not finding the Module.symvers and falling back to the kernel one.
          Please see if the patch fixes your build (it appears to work here).

          stancheff Shaun Tancheff added a comment - Sorry, it took a bit to figure out what was missed. Looks like the build is not finding the Module.symvers and falling back to the kernel one. Please see if the patch fixes your build (it appears to work here).

          "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54984
          Subject: LU-17791 build: use external o2ib path for ko2iblnd.ko
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 9688600bdd8fc297648eff0232b3df21671b757f

          gerrit Gerrit Updater added a comment - "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54984 Subject: LU-17791 build: use external o2ib path for ko2iblnd.ko Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9688600bdd8fc297648eff0232b3df21671b757f
          yujian Jian Yu added a comment -

          Hi stancheff, could you please advise?

          yujian Jian Yu added a comment - Hi stancheff , could you please advise?

          I see. LU-16967 introduced separate ko2iblnd packages for in-kernel and external OFED.
          However, this is a bit confused in new and old syntax.
          What exact options does ko2iblnd need for external OFED?
          configure didn't check anything? In the end, configure still passes even if PATH is external OFED?

          sihara Shuichi Ihara added a comment - I see. LU-16967 introduced separate ko2iblnd packages for in-kernel and external OFED. However, this is a bit confused in new and old syntax. What exact options does ko2iblnd need for external OFED? configure didn't check anything? In the end, configure still passes even if PATH is external OFED?

          Actually, I've confirmed that this regression started from
          commit:8b1d2a72f1 LU-16967 build: Add in-kernel-ko2iblnd driver"

          Complied ko2iblnd.ko against external ofed on Ubuntu 5.15.0-105-generic kernel worked well until one before commit ("commit:3554923af9 LU-10283 mdd: fix parent FID in changelog of striped directory") of above.

          sihara Shuichi Ihara added a comment - Actually, I've confirmed that this regression started from commit:8b1d2a72f1 LU-16967 build: Add in-kernel-ko2iblnd driver" Complied ko2iblnd.ko against external ofed on Ubuntu 5.15.0-105-generic kernel worked well until one before commit ("commit:3554923af9 LU-10283 mdd: fix parent FID in changelog of striped directory") of above.

          I did narrow down a little. at least commit "6521c313f7 New tag 2.15.59" worked.

          # lctl get_param version 
          version=2.15.59
          # lctl list_nids
          10.0.13.222@o2ib22
          

          However, tag 2.15.60 made same problem. So, a regression introduced between 2.15.59 and 2.15.60. I will try to find out exact commit where this regression started.

           

          sihara Shuichi Ihara added a comment - I did narrow down a little. at least commit "6521c313f7 New tag 2.15.59" worked. # lctl get_param version version=2.15.59 # lctl list_nids 10.0.13.222@o2ib22 However, tag 2.15.60 made same problem. So, a regression introduced between 2.15.59 and 2.15.60. I will try to find out exact commit where this regression started.  
          # ./mlnxofedinstall --add-kernel-support --kmp --all --force
          
          Unsupported installation option: '--kmp'
          To see list of supported options, run: ./mlnxofedinstall --help
          
          # ./mlnxofedinstall --help | grep kmp 
          #

          it looks no more '--kmp' option.

          I removed --kmp and configured the following options.

          # ./mlnxofedinstall --add-kernel-support --all --force
          # /etc/init.d/openibd 
          

          I  uninstalled installed lustre-client-modules-5.15.0-105-generic lustre-client-utils debs and re-built new lustre deb packages against re-installed OFED and re-installed them but still got same symbol mismatch.

          sihara Shuichi Ihara added a comment - # ./mlnxofedinstall --add-kernel-support --kmp --all --force Unsupported installation option: '--kmp' To see list of supported options, run: ./mlnxofedinstall --help # ./mlnxofedinstall --help | grep kmp # it looks no more '--kmp' option. I removed --kmp and configured the following options. # ./mlnxofedinstall --add-kernel-support --all --force # /etc/init.d/openibd I  uninstalled installed lustre-client-modules-5.15.0-105-generic lustre-client-utils debs and re-built new lustre deb packages against re-installed OFED and re-installed them but still got same symbol mismatch.
          yujian Jian Yu added a comment -

          Could you please try "./mlnxofedinstall --add-kernel-support --kmp --all --force"?

          yujian Jian Yu added a comment - Could you please try "./mlnxofedinstall --add-kernel-support --kmp --all --force"?

          People

            stancheff Shaun Tancheff
            sihara Shuichi Ihara
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: