[LU-3173] Incorrect detection of ldiskfs series to use Created: 15/Apr/13 Updated: 04/Nov/13 Resolved: 11/Jul/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.5.0, Lustre 2.4.2 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Dmitry Eremin (Inactive) | Assignee: | Dmitry Eremin (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 7733 | ||||||||
| Description |
|
The problem is in AS_VERSION_COMPARE([$LINUXRELEASE],[2.6.32-343] in ldiskfs-build.m4. The result of string comparison of "2.6.32.lustremaster" and "2.6.32-343" give a great and series for rhel6.4 are chosen. The value of EXTRAVERSION in kernel Makefile should be chosen careful for subsequent string comparison or we need to change the code of series selection. |
| Comments |
| Comment by Dmitry Eremin (Inactive) [ 15/Apr/13 ] |
|
The fix for selection of ldiskfs series to use is http://review.whamcloud.com/6056 |
| Comment by Christopher Morrone [ 22/Apr/13 ] |
|
I sounds like the Lustre build of the Linux kernel is currently converting the upstream kernel string from 2.6.32-343 to 2.6.32.lustremaster. That new version string is kind of horrible. How about you just fix that? |
| Comment by Dmitry Eremin (Inactive) [ 23/Apr/13 ] |
|
According https://wiki.hpdd.intel.com/display/PUB/Walk-thru-+Build+Lustre+MASTER+on+RHEL+6.3+from+Whamcloud+git we recommend to edit kernel sources Makefile and modify line 4: EXTRAVERSION = .lustremaster Therefore, the kernel version produced from those sources will be: #define UTS_RELEASE "2.6.32.lustremaster" So, we try to use this version as upstream kernel version. I don't know how we produce pre-built RPMs packages but WiKi recommendation about building Lustre master from git sources is not correct. Anyway we try to rely on string representation which is not a kernel version only. The kernel version is "2.6.32". The suffix "-358.2.1.el6" is a release. $ rpm -qi kernel Name : kernel Relocations: (not relocatable) Version : 2.6.32 Vendor: CentOS Release : 358.2.1.el6 Build Date: Wed 13 Mar 2013 05:35:54 AM MSK Now we use "AS_VERSION_COMPARE([$LINUXRELEASE],[2.6.32-343]," which is not robust as I mentioned above. In the current patch I try to use RHEL_RELEASE_CODE which is just encoding of RHEL_MAJOR and RHEL_MINOR. Probably we could use RHEL_RELEASE which is more precise. But anyway I think we should not rely on string which user can easily change with his custom version. |
| Comment by Christopher Morrone [ 23/Apr/13 ] |
I would strongly recommend changing those instructions. Replacing the very useful release string from upstream with the very vague string ".lustremaster" is just bad advice. Case in point: you guys aren't following that advice! Check your kernel packages. LLNL certainly doesn't choose to rename our kernels that way, and never will. In summary, I think that using the version+release field is plenty robust, if we simply stop giving users bad advice about trashing the release field. The Version+Release string is simple and straight forward to understand. It is easy for a human to match up with rpm names. The Version+Release also more uniquely identifies a kernel. Multiple differing kernels can have the same RHEL_RELEASE_CODE, and that could be a problem for properly idenifying which patches are needed. Honestly, if a user is building their own kernel, they can quite easily change any of the numbers. Lets not worry about all of the crazy things people might do, and focus on the things we normally do. |
| Comment by Dmitry Eremin (Inactive) [ 23/Apr/13 ] |
|
I don't propose to change this approach. I just would like to make this approach more stable regardless user changes. My patch #3 (http://review.whamcloud.com/6056) just extract correct upstream kernel release for RHEL kernels and form it into $(VERSION).$(PATCHLEVEL).$(SUBLEVEL)-$(RHEL_RELEASE). It's exactly Version+Release (2.6.32-358) format like you propose to follow without any additional info and it not depends from EXTRAVERSION which user can change. If EXTRAVERSION is not empty kernel script remove all RHEL release information from kernel version. So, user should carefully write the same RHEL release version as he see in original kernel package. It's not convenient I think. For example: $ rpm -qi kernel Name : kernel Relocations: (not relocatable) Version : 2.6.32 Vendor: CentOS Release : 358.2.1.el6 Build Date: Wed 13 Mar 2013 05:35:54 AM MSK # Therefore write to Makefile EXTRAVERSION = -358.2.1.el6_lustre Moreover the directory name with produced kernel will be ~/rpmbuild/BUILD/kernel-2.6.32358.2.1.el6_lustre.x86_64. It looks confusing at least for me but user should use it in Lustre configure script. So, what is wrong with my #3 patch? Are you insist on WiKi change only? |
| Comment by Christopher Morrone [ 23/Apr/13 ] |
|
Yes, I think that a Wiki-only change is the appropriate fix for this problem. I don't see much value in the additional complexity that the patch introduces. |
| Comment by Yang Sheng [ 23/Apr/13 ] |
|
Hi, Christopher, I think this change is needed. I have received many guys encountered issue when they are build kernel and lustre on local box. The version come from utsrelease.h is depend on environment. But RHEL_RELEASE is stable always. So i think we use it more reasonable. |
| Comment by Christopher Morrone [ 23/Apr/13 ] |
|
Isn't that because you guys are giving them bad instructions right now? With proper instructions, doesn't that problem go away? |
| Comment by Yang Sheng [ 24/Apr/13 ] |
|
It obviously not. Not everyone build kernel from spec file. It also not necessary step. We haven't this problem in past, at least we should restore the behavior before. |
| Comment by Dmitry Eremin (Inactive) [ 24/May/13 ] |
|
Additional drawback of approach to play with EXTRAVERSION only is incorrect specification of the kernel version for dependencies. For example, if I set the following in Makefile: VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 32 EXTRAVERSION = -358.6.2.lum I will got a kernel RPM with following information: Name : kernel Relocations: (not relocatable) Version : 2.6.32358.6.2.lum Vendor: The Linux Community Release : 1 Build Date: Tue 21 May 2013 04:24:59 PM MSK But a lustre-modules will required dependency from "kernel = 2.6.32-358.6.2.lum". So, additional thing we need to fix is "krequires" macro value from lustre.spec file. |
| Comment by Christopher Morrone [ 24/May/13 ] |
|
You know, more and more I think this is just completely the wrong approach to the problem. We are trying to do two things that are conceptually in conflict:
How about this instead: If a user wishes to change the kernel version number, they need to use a configure option to specify which patch set they wish to use. They chose to throw away information, so make them tell ldiskfs which patch set to use. Adding complexity to try to guess when they are making it hard for us seems counterproductive to me. |