[LU-5952] ko2iblnd not built with kernel 2.6.32-504.1.3 Created: 24/Nov/14 Updated: 09/Oct/21 Resolved: 09/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | John DeSantis | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Scientific Linux 6.5 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 16616 | ||||||||
| Description |
|
Hello Lustre tream! We're seeing an issue with the ko2iblnd module not being built properly any longer with our lustre 1.8.7 source with the latest 2.6.32-504.x Linux kernel. We have successfully been building RPM's utilizing the command below for several kernel revisions: 2.6.32_220.x, 2.6.32_279.x, 2.6.32_358.x, and 2.6.32_431.x. ./configure --with-linux=/lib/modules/KERNEL_VERSION/build --disable-lru-resize --enable-ext4 --disable-server However, with the release of 2.6.32_504.x we are now seeing that the ko2iblnd module isn't built, and as a result we get I/O errors when loading the lustre module: LustreError: 2897:0:(api-ni.c:1081:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256 Looking over Google and other Jira tickets, we found that using "--with-o2ib=yes" produces the error (which is probably just a red herring): configure: error: can't compile with kernel OpenIB gen2 headers Given that the other builds were successful, I am fairly certain that this is a kernel issue, but I wanted to double-check here first. I've attached the config.log for perusal. Here is an RPM package query from the latest build to the most recent twp:
Thank you for any guidance and/or additional information. John DeSantis |
| Comments |
| Comment by John DeSantis [ 24/Nov/14 ] |
|
Forgot to initially include the config.log. |
| Comment by Martin Hecht [ 25/Nov/14 ] |
|
@John: I had the same problem - I had to backport a commit from the lustre 2-branch ( @Team: could you pick up the patch from there for review? |
| Comment by John DeSantis [ 25/Nov/14 ] |
|
Martin, Thanks for supplying the patch and the discussion thread! Unfortunately, the module still isn't getting built. I tried several different approaches: 1.) Applied the patch to our 1.8.7 "build" directory; there were some hunk errors applying the patch, but otherwise the RPM's were built. Querying the module package showed that the ko2iblnd module was still missing when compiled against 2.6.32-504.1.3. 2.) Repeated step #1 above, but this time I cherry picked sections of your patch which applied to the files in question (Makefile, autoMakefile.am.toplevel, and build/autoconf/lustre-build-linux.m4) as to avoid the hunk errors while patching. This result in RPM build errors. 3.) Downloaded the lustre 1.8.7 source from http://downloads.lustre.org/public/lustre/v1.8/lustre_1.8.7/source/ and attempted a rebuild after applying the patch http://review.whamcloud.com/#/c/8607/ and your recent patch. Once I attempted to "make rpms", I received build errors. 4.) Downloaded the latest git repo for b1_8 and applied your patch. I then ran autogen.sh and attempted to build. I did not receive any errors, but the ko2iblnd module was missing as with step #1. I've attached another file which shows the patch errors which led to cherry picking and some of the errors experienced when "make rpms" was used. If any of the information seems incomplete, I'll redo the steps above and save all of the output for perusal. During my testing today it's possible that I may have did something out of order. Again, thanks thus so far for your help. John DeSantis |
| Comment by Martin Hecht [ 26/Nov/14 ] |
|
Hi John, Basically, your attempt 4) should be working. This works fine for me: The patch applies cleanly in my case. Now I noticed that I have updated my local git copy some time before the last commits and built lustre with kernel 2.6.32-504. best regards, Martin |
| Comment by John DeSantis [ 26/Nov/14 ] |
|
Martin, Too bad you're not local, I'd get you some pizza and beer Yesterday, I did not clone the git repository correctly. So, after applying your patch and the patch from: http://review.whamcloud.com/#/c/8607/5 (basically the same instructions from https://lists.01.org/pipermail/hpdd-discuss/2014-November/001409.html) I was able to get the RPM's built and verified that the missing module was present. The remaining issue that I'm seeing is degraded performance. I've attached a log file if you wish to review it. Before I start looking into this deeper, do you know if there are any known problems with performance when server and client versions do not match? Again, thank you very, very much. Without your help we would not have progressed far. John DeSantis |
| Comment by Martin Hecht [ 26/Nov/14 ] |
|
Hi John, I'm not aware of performance problems for both (servers and clients) in the 1.8 branch. There were quite some changes to the code for the 1.8.9 release, but I'm not a developer, just a user of lustre. Commenting on your performance report: For large files (at least if you have large chunks of IO) it makes sense to set striping in order to make use of the parallelism in the file system. However, this also depends on the use case. If your application does random access, the better choice may be not striping the file even when it is large. So, it is always difficult to tell if a change in performance is good or bad, because it is a trade off of performance for different use cases. A decrease of performance for bulk IO to a large non-striped file might go along with an improvement of performance for small files or for random IO. Actually, I'm using the patch from http://review.whamcloud.com/#/c/8607/5 as well for quite a while now (probably since the upgrade from SL6.4 to 6.5 or so... the other one is needed since SL6.6 is out). Just a remark since we are looking at llite: If you plan to connect to servers with lustre > 2.4, you should also include In the meantime my build for kernel 2.6.32-504.1.3 is done as well, and ko2iblnd.ko has been built successfully. @Developers: could someone pick up the patch for the kernel-2.6.32-504-series and submit it to the 1.8 branch, please? And maybe the other one as well? best regards, |
| Comment by John DeSantis [ 26/Nov/14 ] |
|
Martin, I completely agree with striping, not just for I/O but also because recently ran into a case where the 2.0 TB file size limit was reached because the file wasn't striped over all available OST's. I was able to get the 1.8.7 client RPM's built (using 1.8.7_wc1) with the ko2iblnd module present using your patch, the I will be doing some testing over the next few days to see if there are any differences in performance with "default" values between the client versions 1.8.7 and 1.8.9. Again, without your patch we would have been stuck. So I, too, can confirm to the @Developers that your patch has worked. John DeSantis |
| Comment by John DeSantis [ 26/Nov/14 ] |
|
Martin, I was able to perform some testing. With your patch and 1.8.7, we're able to mount Lustre without any error and we're seeing expected performance between nodes. John DeSantis |