Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17791

ko2iblnd can't be loaded in 5.15.0-105-generic kernel

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      lustre ko2iblnd modules can't be compiled against 5.15.0-105-generic and ofed-23.10-1.1.9, but they are not able to be loaded properly due to symbol mismatch errors.

      # git log --oneline -1
      2333c8e3ae (HEAD -> master, origin/master, origin/HEAD) LU-17744 ldiskfs: mballoc stats fixes
      
      # uname  -r
      5.15.0-105-generic
      
      # ofed_info -n
      23.10-1.1.9
      
      # git clean -d -x -f; sh ./autogen.sh ; ./configure --with-linux=/usr/src/linux-headers-5.15.0-105-generic --with-o2ib=/usr/src/ofa_kernel/x86_64/5.15.0-105-generic; make debs
      
      # modprobe lustre
      modprobe: ERROR: could not insert 'lustre': Invalid argument
      
      Apr 30 10:03:30 ggpu02 kernel: [422298.241699] libcfs: HW NUMA nodes: 1, HW CPU cores: 96, npartitions: 24
      Apr 30 10:03:30 ggpu02 kernel: [422298.254189] Lustre: Lustre: Build Version: 2.15.62_58_g2333c8e
      Apr 30 10:03:30 ggpu02 kernel: [422298.281629] ko2iblnd: disagrees about version of symbol __ib_alloc_pd
      Apr 30 10:03:30 ggpu02 kernel: [422298.281632] ko2iblnd: Unknown symbol __ib_alloc_pd (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281649] ko2iblnd: disagrees about version of symbol rdma_resolve_addr
      Apr 30 10:03:30 ggpu02 kernel: [422298.281650] ko2iblnd: Unknown symbol rdma_resolve_addr (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281656] ko2iblnd: disagrees about version of symbol rdma_set_service_type
      Apr 30 10:03:30 ggpu02 kernel: [422298.281656] ko2iblnd: Unknown symbol rdma_set_service_type (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281678] ko2iblnd: disagrees about version of symbol ib_dereg_mr_user
      Apr 30 10:03:30 ggpu02 kernel: [422298.281678] ko2iblnd: Unknown symbol ib_dereg_mr_user (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281687] ko2iblnd: disagrees about version of symbol rdma_reject
      Apr 30 10:03:30 ggpu02 kernel: [422298.281687] ko2iblnd: Unknown symbol rdma_reject (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281696] ko2iblnd: disagrees about version of symbol rdma_disconnect
      Apr 30 10:03:30 ggpu02 kernel: [422298.281696] ko2iblnd: Unknown symbol rdma_disconnect (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281723] ko2iblnd: disagrees about version of symbol __rdma_create_kernel_id
      Apr 30 10:03:30 ggpu02 kernel: [422298.281724] ko2iblnd: Unknown symbol __rdma_create_kernel_id (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281729] ko2iblnd: disagrees about version of symbol ib_register_event_handler
      Apr 30 10:03:30 ggpu02 kernel: [422298.281729] ko2iblnd: Unknown symbol ib_register_event_handler (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281746] ko2iblnd: disagrees about version of symbol rdma_resolve_route
      Apr 30 10:03:30 ggpu02 kernel: [422298.281746] ko2iblnd: Unknown symbol rdma_resolve_route (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281750] ko2iblnd: disagrees about version of symbol ib_unregister_event_handler
      Apr 30 10:03:30 ggpu02 kernel: [422298.281751] ko2iblnd: Unknown symbol ib_unregister_event_handler (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281758] ko2iblnd: disagrees about version of symbol rdma_bind_addr
      Apr 30 10:03:30 ggpu02 kernel: [422298.281758] ko2iblnd: Unknown symbol rdma_bind_addr (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281766] ko2iblnd: disagrees about version of symbol rdma_create_qp
      Apr 30 10:03:30 ggpu02 kernel: [422298.281767] ko2iblnd: Unknown symbol rdma_create_qp (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281773] ko2iblnd: disagrees about version of symbol ib_map_mr_sg
      Apr 30 10:03:30 ggpu02 kernel: [422298.281774] ko2iblnd: Unknown symbol ib_map_mr_sg (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281783] ko2iblnd: disagrees about version of symbol ib_query_port
      Apr 30 10:03:30 ggpu02 kernel: [422298.281783] ko2iblnd: Unknown symbol ib_query_port (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281785] ko2iblnd: disagrees about version of symbol rdma_notify
      Apr 30 10:03:30 ggpu02 kernel: [422298.281785] ko2iblnd: Unknown symbol rdma_notify (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281791] ko2iblnd: disagrees about version of symbol rdma_listen
      Apr 30 10:03:30 ggpu02 kernel: [422298.281792] ko2iblnd: Unknown symbol rdma_listen (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281793] ko2iblnd: disagrees about version of symbol rdma_destroy_qp
      Apr 30 10:03:30 ggpu02 kernel: [422298.281794] ko2iblnd: Unknown symbol rdma_destroy_qp (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281798] ko2iblnd: disagrees about version of symbol __ib_create_cq
      Apr 30 10:03:30 ggpu02 kernel: [422298.281798] ko2iblnd: Unknown symbol __ib_create_cq (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281805] ko2iblnd: disagrees about version of symbol ib_alloc_mr
      Apr 30 10:03:30 ggpu02 kernel: [422298.281806] ko2iblnd: Unknown symbol ib_alloc_mr (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281810] ko2iblnd: disagrees about version of symbol rdma_connect_locked
      Apr 30 10:03:30 ggpu02 kernel: [422298.281811] ko2iblnd: Unknown symbol rdma_connect_locked (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281812] ko2iblnd: disagrees about version of symbol rdma_set_reuseaddr
      Apr 30 10:03:30 ggpu02 kernel: [422298.281813] ko2iblnd: Unknown symbol rdma_set_reuseaddr (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281815] ko2iblnd: disagrees about version of symbol ib_destroy_cq_user
      Apr 30 10:03:30 ggpu02 kernel: [422298.281816] ko2iblnd: Unknown symbol ib_destroy_cq_user (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281821] ko2iblnd: disagrees about version of symbol ib_modify_qp
      Apr 30 10:03:30 ggpu02 kernel: [422298.281822] ko2iblnd: Unknown symbol ib_modify_qp (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281835] ko2iblnd: disagrees about version of symbol ib_dma_virt_map_sg
      Apr 30 10:03:30 ggpu02 kernel: [422298.281835] ko2iblnd: Unknown symbol ib_dma_virt_map_sg (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281837] ko2iblnd: disagrees about version of symbol rdma_destroy_id
      Apr 30 10:03:30 ggpu02 kernel: [422298.281838] ko2iblnd: Unknown symbol rdma_destroy_id (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281843] ko2iblnd: disagrees about version of symbol rdma_accept
      Apr 30 10:03:30 ggpu02 kernel: [422298.281844] ko2iblnd: Unknown symbol rdma_accept (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.281851] ko2iblnd: disagrees about version of symbol ib_dealloc_pd_user
      Apr 30 10:03:30 ggpu02 kernel: [422298.281852] ko2iblnd: Unknown symbol ib_dealloc_pd_user (err -22)
      Apr 30 10:03:30 ggpu02 kernel: [422298.314513] LNetError: 2113531:0:(api-ni.c:2609:lnet_load_lnd()) Can't load LND o2ib, module ko2iblnd, rc=256
      Apr 30 10:03:30 ggpu02 kernel: [422298.325808] LustreError: 2113531:0:(events.c:640:ptlrpc_init_portals()) network initialisation failed: rc = -22
      

      Attachments

        Activity

          [LU-17791] ko2iblnd can't be loaded in 5.15.0-105-generic kernel
          # ./mlnxofedinstall --add-kernel-support --kmp --all --force
          
          Unsupported installation option: '--kmp'
          To see list of supported options, run: ./mlnxofedinstall --help
          
          # ./mlnxofedinstall --help | grep kmp 
          #

          it looks no more '--kmp' option.

          I removed --kmp and configured the following options.

          # ./mlnxofedinstall --add-kernel-support --all --force
          # /etc/init.d/openibd 
          

          I  uninstalled installed lustre-client-modules-5.15.0-105-generic lustre-client-utils debs and re-built new lustre deb packages against re-installed OFED and re-installed them but still got same symbol mismatch.

          sihara Shuichi Ihara added a comment - # ./mlnxofedinstall --add-kernel-support --kmp --all --force Unsupported installation option: '--kmp' To see list of supported options, run: ./mlnxofedinstall --help # ./mlnxofedinstall --help | grep kmp # it looks no more '--kmp' option. I removed --kmp and configured the following options. # ./mlnxofedinstall --add-kernel-support --all --force # /etc/init.d/openibd I  uninstalled installed lustre-client-modules-5.15.0-105-generic lustre-client-utils debs and re-built new lustre deb packages against re-installed OFED and re-installed them but still got same symbol mismatch.
          yujian Jian Yu added a comment -

          Could you please try "./mlnxofedinstall --add-kernel-support --kmp --all --force"?

          yujian Jian Yu added a comment - Could you please try "./mlnxofedinstall --add-kernel-support --kmp --all --force"?

          It is. Could you please comment how ofed-23.10-1.1.9 was installed? What command and options were used?

          # cd MLNX_OFED_LINUX-23.10-1.1.9.0-ubuntu22.04-x86_64
          # ./mlnxofedinstall -f
          

          Please also run "dpkg -S /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers" to see which package contains it.

          # ls -l /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers
          -rw-r--r-- 1 root root 91356 Apr 25 18:41 /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers
          
          # dpkg -S /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers 
          dpkg-query: no path found matching pattern /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers
          

          I've attached /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers

          sihara Shuichi Ihara added a comment - It is. Could you please comment how ofed-23.10-1.1.9 was installed? What command and options were used? # cd MLNX_OFED_LINUX-23.10-1.1.9.0-ubuntu22.04-x86_64 # ./mlnxofedinstall -f Please also run "dpkg -S /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers" to see which package contains it. # ls -l /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers -rw-r--r-- 1 root root 91356 Apr 25 18:41 /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers # dpkg -S /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers dpkg-query: no path found matching pattern /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers I've attached /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers
          yujian Jian Yu added a comment -

          It is. Could you please comment how ofed-23.10-1.1.9 was installed? What command and options were used?
          Please also run "dpkg -S /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers" to see which package contains it.

          yujian Jian Yu added a comment - It is. Could you please comment how ofed-23.10-1.1.9 was installed? What command and options were used? Please also run "dpkg -S /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers" to see which package contains it.
          # uname  -r
          5.15.0-105-generic
          
          # ls /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers  -l
          -rw-r--r-- 1 root root 91356 Apr 25 18:41 /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers
          
          # ./configure --with-linux=/usr/src/linux-headers-5.15.0-105-generic --with-o2ib=/usr/src/ofa_kernel/x86_64/5.15.0-105-generic
          
          - snip -
          checking whether to enable tunable backoff TCP support... yes
          checking if Linux kernel has tunable backoff TCP support... no
          checking if external o2iblnd needs to use Compat RDMA... yes
          checking whether to use any OFED backport headers... no
          checking whether to enable OpenIB gen2 support... yes
          configure: adding /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers to external o2ib symbols
          checking if Linux kernel has kthread_worker... no
          checking whether to enable GNI lnd... no
          checking ext4 source directory...
          configure: Lustre kernel checks 

          it seems to be OK and expected, isn't it?

          sihara Shuichi Ihara added a comment - # uname  -r 5.15.0-105-generic # ls /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers -l -rw-r--r-- 1 root root 91356 Apr 25 18:41 /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers # ./configure --with-linux=/usr/src/linux-headers-5.15.0-105-generic --with-o2ib=/usr/src/ofa_kernel/x86_64/5.15.0-105-generic - snip - checking whether to enable tunable backoff TCP support... yes checking if Linux kernel has tunable backoff TCP support... no checking if external o2iblnd needs to use Compat RDMA... yes checking whether to use any OFED backport headers... no checking whether to enable OpenIB gen2 support... yes configure: adding /usr/src/ofa_kernel/x86_64/5.15.0-105-generic/Module.symvers to external o2ib symbols checking if Linux kernel has kthread_worker... no checking whether to enable GNI lnd... no checking ext4 source directory... configure: Lustre kernel checks it seems to be OK and expected, isn't it?
          yujian Jian Yu added a comment -

          Hi sihara,
          Could you please check if Module.symvers is under /usr/src/ofa_kernel/x86_64/5.15.0-105-generic?
          And also check the "configure" log to see if the 'configure: adding ... to Symbol Path' info is correct.

          yujian Jian Yu added a comment - Hi sihara , Could you please check if Module.symvers is under /usr/src/ofa_kernel/x86_64/5.15.0-105-generic? And also check the "configure" log to see if the 'configure: adding ... to Symbol Path' info is correct.

          People

            stancheff Shaun Tancheff
            sihara Shuichi Ihara
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: