Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 1.8.7
-
None
-
Scientific Linux 6.5
-
3
-
16616
Description
Hello Lustre tream!
We're seeing an issue with the ko2iblnd module not being built properly any longer with our lustre 1.8.7 source with the latest 2.6.32-504.x Linux kernel.
We have successfully been building RPM's utilizing the command below for several kernel revisions: 2.6.32_220.x, 2.6.32_279.x, 2.6.32_358.x, and 2.6.32_431.x.
./configure --with-linux=/lib/modules/KERNEL_VERSION/build --disable-lru-resize --enable-ext4 --disable-server
However, with the release of 2.6.32_504.x we are now seeing that the ko2iblnd module isn't built, and as a result we get I/O errors when loading the lustre module:
LustreError: 2897:0:(api-ni.c:1081:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256
LustreError: 2897:0:(events.c:725:ptlrpc_init_portals()) network initialisation failed
Looking over Google and other Jira tickets, we found that using "--with-o2ib=yes" produces the error (which is probably just a red herring):
configure: error: can't compile with kernel OpenIB gen2 headers
Given that the other builds were successful, I am fairly certain that this is a kernel issue, but I wanted to double-check here first. I've attached the config.log for perusal.
Here is an RPM package query from the latest build to the most recent twp:
- pwd
/root/rpmbuild/RPMS/x86_64
- ls -1t lustre-modules-1.8.7-2.6.32_*|head -3
lustre-modules-1.8.7-2.6.32_504.1.3.el6.x86_64.x86_64.rpm
lustre-modules-1.8.7-2.6.32_431.29.2.el6.x86_64.x86_64.rpm
lustre-modules-1.8.7-2.6.32_431.23.3.el6.x86_64.x86_64.rpm
- ls -1t lustre-modules-1.8.7-2.6.32_*|head -3|xargs -I'{}' rpm -qlp {} |grep ko2iblnd
/lib/modules/2.6.32-431.29.2.el6.x86_64/updates/kernel/net/lustre/ko2iblnd.ko
/lib/modules/2.6.32-431.23.3.el6.x86_64/updates/kernel/net/lustre/ko2iblnd.ko
Thank you for any guidance and/or additional information.
John DeSantis
Attachments
Issue Links
- is related to
-
LU-5909 Kernel update [RHEL6.6 2.6.32-504.1.3.el6]
-
- Resolved
-
Hi John,
I'm not aware of performance problems for both (servers and clients) in the 1.8 branch. There were quite some changes to the code for the 1.8.9 release, but I'm not a developer, just a user of lustre. Commenting on your performance report: For large files (at least if you have large chunks of IO) it makes sense to set striping in order to make use of the parallelism in the file system. However, this also depends on the use case. If your application does random access, the better choice may be not striping the file even when it is large. So, it is always difficult to tell if a change in performance is good or bad, because it is a trade off of performance for different use cases. A decrease of performance for bulk IO to a large non-striped file might go along with an improvement of performance for small files or for random IO.
Actually, I'm using the patch from http://review.whamcloud.com/#/c/8607/5 as well for quite a while now (probably since the upgrade from SL6.4 to 6.5 or so... the other one is needed since SL6.6 is out).
Just a remark since we are looking at llite: If you plan to connect to servers with lustre > 2.4, you should also include
http://review.whamcloud.com/#/c/5971/ as a fix for
LU-3067(this is a correction to commit http://review.whamcloud.com/#/c/5285/ which is included in 1.8.9).In the meantime my build for kernel 2.6.32-504.1.3 is done as well, and ko2iblnd.ko has been built successfully.
@Developers: could someone pick up the patch for the kernel-2.6.32-504-series and submit it to the 1.8 branch, please? And maybe the other one as well?
best regards,
Martin