[LU-6083] IB with Ubuntu 14.04 client Created: 06/Jan/15  Updated: 01/Jul/16  Resolved: 24/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Wang Shilong (Inactive) Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: patch
Environment:

Ubuntu 14.04
3.13.0-32-generic


Attachments: Text File config.log    
Issue Links:
Related
is related to LU-1706 Building debian modules for Lustre 2.... Resolved
is related to LU-5953 lustre[-dkms] needs to automatically ... Resolved
is related to LU-5628 Dealing with kernels that have lustre... Resolved
Severity: 3
Rank (Obsolete): 16928

 Description   

I compiled Lustre client under Ubuntu 14.04, with network tcp/ip, it works, but
IB didn't.

Download IB driver for ubuntu14(kernel 3.13.0-32-generic) from:
http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-2.3-2.0.0&mname=MLNX_OFED_LINUX-2.3-2.0.0-ubuntu14.04-x86_64.iso

Attachment is failed config.log, failed messages firstly come:
"
/usr/src/mlnx-ofed-kernel-2.3/include/linux/compat-2.6.h:17:35: fatal error: linux/compat_autoconf.h: No such file or directory
#include <linux/compat_autoconf.h>
"
And if i skipped this error by remove this including in source file, i still hit following error:

"
configure: error: an external source tree was specified for o2iblnd however I could not find a /usr/src/mlnx-ofed-kernel-2.3/Module.symvers there
"
if i touched a Module.symvers(a little hack) there and compile finished, and i installed these debs, when modprobe lustre with IB, i hit following messages:

[422278.843073] ko2iblnd: Unknown symbol rdma_create_qp (err -22)
[422278.843080] ko2iblnd: disagrees about version of symbol ib_destroy_cq
[422278.843081] ko2iblnd: Unknown symbol ib_destroy_cq (err -22)
[422278.843084] ko2iblnd: disagrees about version of symbol rdma_create_id
[422278.843085] ko2iblnd: Unknown symbol rdma_create_id (err -22)
[422278.843101] ko2iblnd: disagrees about version of symbol rdma_listen
[422278.843103] ko2iblnd: Unknown symbol rdma_listen (err -22)
[422278.843105] ko2iblnd: disagrees about version of symbol rdma_destroy_qp
[422278.843107] ko2iblnd: Unknown symbol rdma_destroy_qp (err -22)
[422278.843113] ko2iblnd: disagrees about version of symbol ib_query_device
[422278.843115] ko2iblnd: Unknown symbol ib_query_device (err -22)
[422278.843119] ko2iblnd: disagrees about version of symbol ib_get_dma_mr
[422278.843120] ko2iblnd: Unknown symbol ib_get_dma_mr (err -22)
[422278.843131] ko2iblnd: disagrees about version of symbol ib_alloc_pd
[422278.843132] ko2iblnd: Unknown symbol ib_alloc_pd (err -22)
[422278.843143] ko2iblnd: disagrees about version of symbol rdma_set_reuseaddr
[422278.843144] ko2iblnd: Unknown symbol rdma_set_reuseaddr (err -22)
[422278.843148] ko2iblnd: disagrees about version of symbol rdma_connect
[422278.843149] ko2iblnd: Unknown symbol rdma_connect (err -22)
[422278.843154] ko2iblnd: disagrees about version of symbol ib_modify_qp
[422278.843156] ko2iblnd: Unknown symbol ib_modify_qp (err -22)
[422278.843168] ko2iblnd: disagrees about version of symbol rdma_destroy_id
[422278.843169] ko2iblnd: Unknown symbol rdma_destroy_id (err -22)
[422278.843174] ko2iblnd: disagrees about version of symbol rdma_accept
[422278.843176] ko2iblnd: Unknown symbol rdma_accept (err -22)
[422278.843189] ko2iblnd: disagrees about version of symbol ib_dealloc_pd
[422278.843190] ko2iblnd: Unknown symbol ib_dealloc_pd (err -22)
[422278.843195] ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys
[422278.843196] ko2iblnd: Unknown symbol ib_fmr_pool_map_phys (err -22)
[422278.843452] LNetError: 29038:0:(api-ni.c:1515:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256
[422278.853606] LustreError: 29038:0:(events.c:629:ptlrpc_init_portals()) network initialisation failed

Could you guys take a look at this issue.



 Comments   
Comment by Wang Shilong (Inactive) [ 06/Jan/15 ]

BTW, we need apply this http://review.whamcloud.com/#/c/13129/ for master to make normal compiling pass.

Comment by Peter Jones [ 06/Jan/15 ]

Bob

Could you please review this patch?

Thanks

Peter

Comment by Bob Glossman (Inactive) [ 06/Jan/15 ]

suspect the problem comes from using Mellanox infiniband. Is this really necessary? As far as I know Ubuntu has a perfectly fine inkernel IB stack that is configured in by defauit.

Comment by Bob Glossman (Inactive) [ 06/Jan/15 ]

don't know if it makes any difference but the kernel version you call out, 3.13.0-32-generic, looks a bit obsolete. The current one in Ubuntu 14.04 is 3.13.0-0.43

Comment by Bob Glossman (Inactive) [ 06/Jan/15 ]

when you say lustre client are you referring to the upstream lustre client that is part of the Ubuntu kernel tree in drivers/staging/lustre, or the lustre client built from the community lustre git tree built on and for Ubuntu?

Comment by Wang Shilong (Inactive) [ 07/Jan/15 ]

Sorry for incomplete information, I mean for using git tree built with master branch

Comment by Wang Shilong (Inactive) [ 07/Jan/15 ]

BTW, i see similar problem reported here.
https://jira.hpdd.intel.com/browse/LU-5597

We really need use Mellanox infiniband, because for Ubuntu built-in IB did
not work for us.

Also I tired again, modprobe will also reported following messages:
[ 7528.946399] ko2iblnd: no symbol version for ib_create_cq
[ 7528.946399] ko2iblnd: Unknown symbol ib_create_cq (err -22)
[ 7528.946409] ko2iblnd: no symbol version for rdma_resolve_addr
[ 7528.946410] ko2iblnd: Unknown symbol rdma_resolve_addr (err -22)
[ 7528.946414] ko2iblnd: no symbol version for ib_reg_phys_mr
[ 7528.946415] ko2iblnd: Unknown symbol ib_reg_phys_mr (err -22)
[ 7528.946419] ko2iblnd: no symbol version for ib_create_fmr_pool
[ 7528.946419] ko2iblnd: Unknown symbol ib_create_fmr_pool (err -22)
[ 7528.946426] ko2iblnd: no symbol version for ib_flush_fmr_pool
[ 7528.946427] ko2iblnd: Unknown symbol ib_flush_fmr_pool (err -22)
[ 7528.946439] ko2iblnd: no symbol version for ib_dereg_mr
[ 7528.946440] ko2iblnd: Unknown symbol ib_dereg_mr (err -22)
[ 7528.946443] ko2iblnd: no symbol version for rdma_reject
[ 7528.946444] ko2iblnd: Unknown symbol rdma_reject (err -22)
[ 7528.946448] ko2iblnd: no symbol version for rdma_disconnect
[ 7528.946449] ko2iblnd: Unknown symbol rdma_disconnect (err -22)
[ 7528.946471] ko2iblnd: no symbol version for rdma_resolve_route
[ 7528.946471] ko2iblnd: Unknown symbol rdma_resolve_route (err -22)
[ 7528.946476] ko2iblnd: no symbol version for rdma_bind_addr
[ 7528.946476] ko2iblnd: Unknown symbol rdma_bind_addr (err -22)
[ 7528.946478] ko2iblnd: no symbol version for rdma_create_qp
[ 7528.946479] ko2iblnd: Unknown symbol rdma_create_qp (err -22)
[ 7528.946483] ko2iblnd: no symbol version for ib_destroy_cq
[ 7528.946484] ko2iblnd: Unknown symbol ib_destroy_cq (err -22)
[ 7528.946486] ko2iblnd: no symbol version for rdma_create_id
[ 7528.946487] ko2iblnd: Unknown symbol rdma_create_id (err -22)
[ 7528.946496] ko2iblnd: no symbol version for rdma_listen
[ 7528.946497] ko2iblnd: Unknown symbol rdma_listen (err -22)
[ 7528.946499] ko2iblnd: no symbol version for rdma_destroy_qp
[ 7528.946500] ko2iblnd: Unknown symbol rdma_destroy_qp (err -22)
[ 7528.946504] ko2iblnd: no symbol version for ib_query_device
[ 7528.946505] ko2iblnd: Unknown symbol ib_query_device (err -22)
[ 7528.946507] ko2iblnd: no symbol version for ib_get_dma_mr
[ 7528.946508] ko2iblnd: Unknown symbol ib_get_dma_mr (err -22)
[ 7528.946514] ko2iblnd: no symbol version for ib_alloc_pd
[ 7528.946515] ko2iblnd: Unknown symbol ib_alloc_pd (err -22)
[ 7528.946522] ko2iblnd: no symbol version for rdma_set_reuseaddr
[ 7528.946523] ko2iblnd: Unknown symbol rdma_set_reuseaddr (err -22)
[ 7528.946525] ko2iblnd: no symbol version for rdma_connect
[ 7528.946526] ko2iblnd: Unknown symbol rdma_connect (err -22)
[ 7528.946529] ko2iblnd: no symbol version for ib_modify_qp
[ 7528.946530] ko2iblnd: Unknown symbol ib_modify_qp (err -22)
[ 7528.946537] ko2iblnd: no symbol version for ib_destroy_fmr_pool
[ 7528.946538] ko2iblnd: Unknown symbol ib_destroy_fmr_pool (err -22)
[ 7528.946540] ko2iblnd: no symbol version for rdma_destroy_id
[ 7528.946540] ko2iblnd: Unknown symbol rdma_destroy_id (err -22)
[ 7528.946543] ko2iblnd: no symbol version for rdma_accept
[ 7528.946544] ko2iblnd: Unknown symbol rdma_accept (err -22)
[ 7528.946553] ko2iblnd: no symbol version for ib_dealloc_pd
[ 7528.946553] ko2iblnd: Unknown symbol ib_dealloc_pd (err -22)
[ 7528.946556] ko2iblnd: no symbol version for ib_fmr_pool_map_phys
[ 7528.946557] ko2iblnd: Unknown symbol ib_fmr_pool_map_phys (err -22)
[ 7528.946987] LNetError: 25709:0:(api-ni.c:1515:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256
[ 7529.024336] LustreError: 25709:0:(events.c:629:ptlrpc_init_portals()) network initialisation failed

Comment by Wang Shilong (Inactive) [ 07/Jan/15 ]

With this solutions: https://github.com/ahlabenadam/lustre_fix.git

Now i could load Lustre with IB successfully, Let me test it further.
But i think in the long term considering, we'd better fix this issue for point of Lustre.

Best Regards,
Wang Shilong

Comment by James A Simmons [ 07/Jan/15 ]

This sounds very similar to LU-5597

Comment by Nathaniel Clark [ 15/Aug/15 ]

FYI: Kernel Compatibility
Ubuntu 14.04
MLNX 2.4-1.0.4

3.13 - Yes
3.16 - NO
3.19 - NO

Ubuntu 14.04
MLNX 3.0-2.0.1

3.13 - ?
3.16 - Yes
3.19 - ?

Comment by Nathaniel Clark [ 17/Aug/15 ]

Modified Build Instructions without need for ahlabenadam fix:

Install MLNX OFED as normal (must use 3.13 kernel for MLNX 2.4-1.0.4)
As root:

cd /usr/src/ofa_kernel
./ofed_scripts/gen-compat-autoconf.sh include/linux/compat-3.13.h > include/linux/compat_autoconf.h
export MODULES_DIR=/lib/modules/$(uname -r)/updates/dkms/./
./ofed_scripts/create_Module.symvers.sh

In lustre-release:

./configure --with-o2ib=/usr/src/ofa_kernel --disable-server --enable-quota

NOTE: compilation problem still exists LU-5628, can be partially solved by adding --with-max-payload-mb=1 to configure line and then editing config.h to replace ((1)<<20) with 1048576

Comment by James A Simmons [ 07/Oct/15 ]

What is left for this work besides adding Documentation? For me everything works well.

Comment by Nathaniel Clark [ 01/Jun/16 ]

This issue is handled by patch http://review.whamcloud.com/20523 linked to LU-5953

Comment by Nathaniel Clark [ 24/Jun/16 ]

Patch landed to master

Generated at Sat Feb 10 01:57:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.