[LU-101] building Lustre against OFED when the kernel has built-in IB causes symbol mismatch Created: 28/Feb/11 Updated: 28/Jun/11 Resolved: 18/Mar/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Kit Westneat (Inactive) | Assignee: | Brian Murrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 4 |
| Rank (Obsolete): | 10337 |
| Description |
|
When building 1.8.x against out-of-kernel OFED and a Lustre kernel with in-kernel IB support, Lustre will pick up the IB symvers of the kernel. The workaround is to remove all references to IB in the kernel module.symvers, but it's a confusing step. It would be nice if Lustre did the right thing here and preferred the --with-o2ib symvers over the kernel symvers. I thought that there had been some work on this in the past, but it seems to not have stuck, as we still run into this issue regularly. Thanks, On an administrative note: Which should we use to mark how important a bug is? Priority or severity? This issue is not at all critical since there is a well understood workaround and it doesn't affect anything except the build process. |
| Comments |
| Comment by Peter Jones [ 28/Feb/11 ] |
|
Brian You are the engineer who had worked in this area. Could you please comment? Thanks Peter |
| Comment by Peter Jones [ 28/Feb/11 ] |
|
Kit Priority=Importance and Severity=Impactfulness. Usually these are in sync, but it might be possible to have a low priority high severity issue (a theoretical system crasher that never occurs in real life) or a high priority low severity issue (Lustre mis-spelt somewhere very noticeable) HTH Peter |
| Comment by Brian Murrell (Inactive) [ 03/Mar/11 ] |
|
Kit, from looking at the code (lnet/autoconf/lustre-lnet.m4:447 in current master) and the kernel documentation for Module.symvers AFAICT, we are doing the right thing already, modulo the lack of specification in the kernel's modules.txt about which Module.symvers takes precedence when a symbol exists in both the kernel's own Module.symvers and an external module's Module.symvers. As far as having to remove the IB symbols from the kernel's Module.symvers, obviously this is not something the Lustre build code can do as it would be (IMHO) a serious issue to go molesting the kernel source tree as part of a module build. I will keep investigating to see if I can resolve the question about preference when more than one Module.symvers has a given symbol. |
| Comment by Brian Murrell (Inactive) [ 04/Mar/11 ] |
|
I can't reproduce this. I built OFED-1.5.2's kernel-ib {,-devel}and installed the kernel-ib-devel. I then built lustre with: $ ./configure --with-linux=/usr/src/kernels/2.6.18-194.32.1.el5-x86_64/ --with-o2ib=/usr/src/ofa_kernel/ --without-lustre-iokit --disable-quilt --disable-liblustre --disable-docs --disable-snmp --disable-tests $ make When I interrogate lnet/klnds/o2iblnd/ko2iblnd.ko for the version of a given I/B function that it wants I get: $ /sbin/modprobe --dump-modversions lustre-release/lnet/klnds/o2iblnd/ko2iblnd.ko | grep ib_create_cq 0x17fa926d ib_create_cq When I then interrogate the kernel and OFED Module.symvers files: $ grep ib_create_cq /usr/src/ofa_kernel/Module.symvers /usr/src/kernels/2.6.18-194.32.1.el5-x86_64/Module.symvers /usr/src/ofa_kernel/Module.symvers:0x17fa926d ib_create_cq drivers/infiniband/core/ib_core EXPORT_SYMBOL /usr/src/kernels/2.6.18-194.32.1.el5-x86_64/Module.symvers:0x1e8a9cb5 ib_create_cq drivers/infiniband/core/ib_core EXPORT_SYMBOL So as you can see, ko2iblnd.ko is looking for the OFED version of the function, not the kernel's own version. Can you provide any more information as to how you are arriving at different results? Or maybe where I am misunderstanding your problem? |
| Comment by Brian Murrell (Inactive) [ 08/Mar/11 ] |
|
Kit, Were you able to gather any more information on this issue? |
| Comment by Kit Westneat (Inactive) [ 11/Mar/11 ] |
|
Hey Brian, I haven't had time to work on this unfortunately, I should be able to next week. Thanks, |
| Comment by Kit Westneat (Inactive) [ 18/Mar/11 ] |
|
It looks like it is working, so I'm not sure what was going on. I'll close it up. |
| Comment by Kit Westneat (Inactive) [ 18/Mar/11 ] |
|
actually, is it possible for me to close it? I can't find that functionality. |
| Comment by Peter Jones [ 18/Mar/11 ] |
|
Not sure, but I have closed it for you Kit |