[LU-101] building Lustre against OFED when the kernel has built-in IB causes symbol mismatch Created: 28/Feb/11  Updated: 28/Jun/11  Resolved: 18/Mar/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Kit Westneat (Inactive) Assignee: Brian Murrell (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 4
Rank (Obsolete): 10337

 Description   

When building 1.8.x against out-of-kernel OFED and a Lustre kernel with in-kernel IB support, Lustre will pick up the IB symvers of the kernel. The workaround is to remove all references to IB in the kernel module.symvers, but it's a confusing step. It would be nice if Lustre did the right thing here and preferred the --with-o2ib symvers over the kernel symvers. I thought that there had been some work on this in the past, but it seems to not have stuck, as we still run into this issue regularly.

Thanks,
Kit

On an administrative note: Which should we use to mark how important a bug is? Priority or severity? This issue is not at all critical since there is a well understood workaround and it doesn't affect anything except the build process.



 Comments   
Comment by Peter Jones [ 28/Feb/11 ]

Brian

You are the engineer who had worked in this area. Could you please comment?

Thanks

Peter

Comment by Peter Jones [ 28/Feb/11 ]

Kit

Priority=Importance and Severity=Impactfulness. Usually these are in sync, but it might be possible to have a low priority high severity issue (a theoretical system crasher that never occurs in real life) or a high priority low severity issue (Lustre mis-spelt somewhere very noticeable)

HTH

Peter

Comment by Brian Murrell (Inactive) [ 03/Mar/11 ]

Kit, from looking at the code (lnet/autoconf/lustre-lnet.m4:447 in current master) and the kernel documentation for Module.symvers AFAICT, we are doing the right thing already, modulo the lack of specification in the kernel's modules.txt about which Module.symvers takes precedence when a symbol exists in both the kernel's own Module.symvers and an external module's Module.symvers.

As far as having to remove the IB symbols from the kernel's Module.symvers, obviously this is not something the Lustre build code can do as it would be (IMHO) a serious issue to go molesting the kernel source tree as part of a module build.

I will keep investigating to see if I can resolve the question about preference when more than one Module.symvers has a given symbol.

Comment by Brian Murrell (Inactive) [ 04/Mar/11 ]

I can't reproduce this.

I built OFED-1.5.2's kernel-ib

{,-devel}

and installed the kernel-ib-devel. I then built lustre with:

$ ./configure --with-linux=/usr/src/kernels/2.6.18-194.32.1.el5-x86_64/ --with-o2ib=/usr/src/ofa_kernel/ --without-lustre-iokit --disable-quilt --disable-liblustre --disable-docs --disable-snmp --disable-tests
$ make

When I interrogate lnet/klnds/o2iblnd/ko2iblnd.ko for the version of a given I/B function that it wants I get:

$ /sbin/modprobe --dump-modversions lustre-release/lnet/klnds/o2iblnd/ko2iblnd.ko | grep ib_create_cq
0x17fa926d	ib_create_cq

When I then interrogate the kernel and OFED Module.symvers files:

$ grep ib_create_cq /usr/src/ofa_kernel/Module.symvers /usr/src/kernels/2.6.18-194.32.1.el5-x86_64/Module.symvers 
/usr/src/ofa_kernel/Module.symvers:0x17fa926d	ib_create_cq	drivers/infiniband/core/ib_core	EXPORT_SYMBOL
/usr/src/kernels/2.6.18-194.32.1.el5-x86_64/Module.symvers:0x1e8a9cb5	ib_create_cq	drivers/infiniband/core/ib_core	EXPORT_SYMBOL

So as you can see, ko2iblnd.ko is looking for the OFED version of the function, not the kernel's own version.

Can you provide any more information as to how you are arriving at different results? Or maybe where I am misunderstanding your problem?

Comment by Brian Murrell (Inactive) [ 08/Mar/11 ]

Kit,

Were you able to gather any more information on this issue?

Comment by Kit Westneat (Inactive) [ 11/Mar/11 ]

Hey Brian,

I haven't had time to work on this unfortunately, I should be able to next week.

Thanks,
Kit

Comment by Kit Westneat (Inactive) [ 18/Mar/11 ]

It looks like it is working, so I'm not sure what was going on. I'll close it up.

Comment by Kit Westneat (Inactive) [ 18/Mar/11 ]

actually, is it possible for me to close it? I can't find that functionality.

Comment by Peter Jones [ 18/Mar/11 ]

Not sure, but I have closed it for you Kit

Generated at Sat Feb 10 01:03:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.