[LU-2262] unknown symbols when compiling lustre client without specifying --disable-server in ./configure command Created: 02/Nov/12  Updated: 08/Dec/13  Resolved: 08/Dec/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Frederik Ferner (Inactive) Assignee: Minh Diep
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 5411

 Description   

While evaluating Lustre 2.3 for our environment, I tried to recompile the lustre client for a later version of the Red Hat kernel. With the following commands

git checkout -b b2_3 remotes/origin/b2_3
sh autogen.sh
./configure --with-linux=/usr/src/kernels/2.6.32-279.11.1.el6.x86_64

I got the warnings below during the build but the build completed.

configure: WARNING:

Disabling server because complete ext4 source does not exist.

If you are building using kernel-devel packages and require ldiskfs
server support then ensure that the matching kernel-debuginfo-common
and kernel-debuginfo-common-<arch> packages are installed.


[snip]

WARNING: /lib/modules/2.6.32-279.11.1.el6.x86_64/updates/kernel/fs/lustre/obdecho.ko needs unknown symbol echo_obd_ops
WARNING: /lib/modules/2.6.32-279.11.1.el6.x86_64/updates/kernel/fs/lustre/obdecho.ko needs unknown symbol echo_persistent_pages_fini
WARNING: /lib/modules/2.6.32-279.11.1.el6.x86_64/updates/kernel/fs/lustre/obdecho.ko needs unknown symbol echo_persistent_pages_init
WARNING: /lib/modules/2.6.32-279.11.1.el6.x86_64/updates/kernel/fs/lustre/ptlrpc.ko needs unknown symbol lut_boot_epoch_update
WARNING: /lib/modules/2.6.32-279.11.1.el6.x86_64/updates/kernel/fs/lustre/ptlrpc.ko needs unknown symbol lut_mod_exit
WARNING: /lib/modules/2.6.32-279.11.1.el6.x86_64/updates/kernel/fs/lustre/ptlrpc.ko needs unknown symbol lut_mod_init 

Trying to load the modules gave unknown symbol errors:

Nov  1 16:17:53 cs04r-sc-serv-48 kernel: Lustre: Lustre: Build Version: 2.3.0--PRISTINE-2.6.32-279.11.1.el6.x86_64
Nov  1 16:17:53 cs04r-sc-serv-48 kernel: ptlrpc: Unknown symbol lut_boot_epoch_update
Nov  1 16:17:53 cs04r-sc-serv-48 kernel: ptlrpc: Unknown symbol lut_mod_exit
Nov  1 16:17:53 cs04r-sc-serv-48 kernel: ptlrpc: Unknown symbol lut_mod_init
Nov  1 16:17:53 cs04r-sc-serv-48 modprobe: FATAL: Error inserting lustre (/lib/modules/2.6.32-279.11.1.el6.x86_64/updates/kernel/fs/lustre/lustre.ko): Unknown symbol in module, or unknown parameter (see dmesg)

compiling the same code with additional --disable-servers added to the configure options compiles without warnings and loading the module works as expected.

The same configure line without the --disable-servers option does work without problems for compiling the lustre client rpms on lustre 1.8.X



 Comments   
Comment by Peter Jones [ 02/Nov/12 ]

Glad to see you checking out 2.3 Frederik!

Minh could you please look into this one? Thanks!

Comment by James A Simmons [ 02/Nov/12 ]

You would need a patch similar to LU-2068.

Comment by Minh Diep [ 06/Nov/12 ]

http://review.whamcloud.com/#change,1873 remove the patched kernel detection which cause this issue. I am investigating further.

Comment by Brian Murrell (Inactive) [ 06/Nov/12 ]

Per our discussion, find out which of the server patches (not including ldiskfs patches since in theory we can still patch and build ldiskfs with a patchless server kernel) is likely to be "last patch standing" and build an autoconf test to test for presence of that patch in the kernel

{source|headers}

that lustre was given to build for. That test will be your auto-detection for server build or not.

As an aside: there really are not too many patches left for the server and at least half of them seem generic enough that they ought to go upstream with relative ease, I would think.

Comment by Brian Murrell (Inactive) [ 06/Nov/12 ]

Ahh. Andreas and Oleg are watching this one. I wonder if one or both of them would like to advise which is likely to be the last server patch standing.

Comment by Andreas Dilger [ 06/Nov/12 ]

The last patch that we require for Lustre is dev_read_only, which is only needed for testing. See LU-20 for details of patchless server kernels.

However, I don't really think this problem is directly related to patchless kernels, but rather is a defect in the configure code that setting enable_server='no' in build/autoconf/lustre-build-ldiskfs.m4 may be too late to prevent HAVE_SERVER_SUPPORT from being set.

Possibly this is already fixed in master?

Comment by Minh Diep [ 06/Nov/12 ]

no, it's not fixed in master.

Comment by Brian Murrell (Inactive) [ 07/Nov/12 ]

a defect in the configure code that setting enable_server='no' in build/autoconf/lustre-build-ldiskfs.m4 may be too late to prevent HAVE_SERVER_SUPPORT from being set.

Yeah, that is actually the problem.

However, simply using the can/will build ldiskfs y/n test as a test whether the lustre servers can be built against the provided kernel source tree doesn't seem sufficient.

The can/will build ldiskfs y/n test is simply "if $EXT_DIR/dir.c, $EXT_DIR/file.c, and $EXT_DIR/inode.c exist in the kernel source, then we can build ldiskfs" and by extension we are asserting that we can build lustre servers, but lustre servers cannot be built unless the kernel source is also patched, right? It seems a further check that the source has been patched needs to be done doesn't it?

Comment by Andreas Dilger [ 07/Nov/12 ]

No, in fact the current Lustre code does not require that the kernel be patched at all. This is required for Lustre testing (the dev_read_only patch in order to simulate server crashes at specific points in the code), but it isn't needed for normal operation.

There is a separate check whether there are ldiskfs patches for the kernel in order to enable ldiskfs, but they are not needed if building a ZFS-only server.

Comment by Minh Diep [ 07/Nov/12 ]

does this mean we should restore the LUSTRE_KERNEL_VERSION check?

Comment by Minh Diep [ 07/Nov/12 ]

or make disable server by default and use --enable-server if we want to build the kernel

Comment by Andreas Dilger [ 08/Nov/12 ]

I think we want to continue to default to building the server code, and definitely do not want to introduce version checks. If neither ldiskfs patches are available for the current kernel, nor ZFS libraries, then the server code could be disabled.

That said, whether the server modules are built or not, that shouldn't prevent the client modules from being usable. Only the osd-ldiskfs and osd-zfs modules should really need to interact with the backing storage, and everything else should be "patchless" already. The real question is why ptlrpc is referencing symbols only in server modules, but those modules were apparently not built?

The LU-1330 patches already in Gerrit would help separate the client and server code, but I don't know if it is the whole solution to the problem here or not.

Comment by Brian Murrell (Inactive) [ 08/Nov/12 ]

No, in fact the current Lustre code does not require that the kernel be patched at all.

Are there any penalties (i.e. performance, races, etc.) for not using the patches?

If the only patch really necessary is for testing, and there are no performance or operational issues with not using any of the other patches (assuming they are not using RAID, or the fusion mpt controller, etc), why are we not advertising to end users that they can build lustre servers against their choice of stock (EL6) vendor kernel (within the limits of supported kernel versions)? This would be huge!

Comment by Andreas Dilger [ 10/Nov/12 ]

There are some issues fixed by the patches that are not needed for performance/correctness on most systems:

  • bh_lru_size_config.patch - needed for good pdirops performance
  • blkdev_tunables-2.6-rhel6.patch - needed for 1MB IO size (at least it used to be, maybe SG chaining fixes this?)
  • mpt-fusion-max-sge-rhel6.patch - 1MB IO size, only needed for MPT Fusion cards
  • raid5-mmp-unplug-dev-rhel6.patch - for MMP on MD RAID devices

It would definitely make sense to try and get these into the upstream kernels if we could.

Comment by Minh Diep [ 26/Jul/13 ]

fyi, tried it on latest master and no issue

Comment by Minh Diep [ 26/Jul/13 ]

hmm...I don't see this issue anymore on b2_3

Frederik, could you double check? - thanks

Comment by Minh Diep [ 02/Dec/13 ]

Frederik, Please check if you still see this issue. I will close this in about a week. thanks

Generated at Sat Feb 10 01:23:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.