[LU-10549] Cannot start lnet with MOFED 3.4 Created: 22/Jan/18  Updated: 02/Feb/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Henri Doreau (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

Since commit 31d6445718b896290198f5d127f86c174d499c6c, we are unable to load lnet when using MOFED 3.4
[ 816.546805] LNetError: 28107:0:(o2iblnd.c:2519:kiblnd_hdev_get_attr()) Invalid mr size: 0x3fdf
[ 816.548804] LNetError: 28107:0:(o2iblnd.c:2739:kiblnd_dev_failover()) Can't setup device: -22
[ 816.550756] LNetError: 28107:0:(o2iblnd.c:2849:kiblnd_create_dev()) Can't initialize device: -22
[ 817.552196] LNetError: 105-4: Error -100 starting up LNI o2ib
[ 817.555577] LustreError: 28107:0:(events.c:630:ptlrpc_init_portals()) network initialisation failed

This seems to be related to the (changing) position of the ib_device->attr field. The position of this field is different in lustre (due to odp_statistics being zero-lenghted) and mofed.

Lustre should not ignore the (M)OFED options when --with-o2ib is specified on build.



 Comments   
Comment by Peter Jones [ 22/Jan/18 ]

Amir

Is the expectation that this should work even for older versions of MOFED?

Peter

Comment by Amir Shehata (Inactive) [ 02/Feb/18 ]

did you rebuild lustre against the specified mofed version?

from your statement

Lustre should not ignore the (M)OFED options when --with-o2ib is specified on build.

It seems like you have not? Are you implying that you tried to rebuild but it ignored it?

Generated at Sat Feb 10 02:36:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.