[LU-4287] Kernel update [RHEL6.5 2.6.32-431.3.1.el6] Created: 21/Nov/13 Updated: 14/Feb/14 Resolved: 10/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Yang Sheng | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Rank (Obsolete): | 11766 | ||||
| Description |
|
This update fixes the following security issues:
Red Hat would like to thank Stephan Mueller for reporting CVE-2013-4345, This update also fixes several hundred bugs and adds enhancements. Refer to All Red Hat Enterprise Linux 6 users are advised to install these updated Bugs fixed (https://bugzilla.redhat.com/): 627128 - kernel spec: devel_post macro: hardlink fc typo |
| Comments |
| Comment by Fredrik Nyström [ 27/Nov/13 ] |
|
Definition of getname() and putname() in /usr/src/kernels/2.6.32-431.el6.x86_64/include/linux/fs.h has changed. I was able to build 1.8 client by introducing local getname() and putname() in lustre/llite/dir.c same way as was done here: I suspect this will also be needed for 2.1. Regards / Fredrik |
| Comment by Bob Glossman (Inactive) [ 04/Dec/13 ] |
|
seeing build failures in 6.5. even a simple client build now fails. example: CC [M] /home/bogl/lustre-release/libcfs/libcfs/linux/linux-tracefile.o
In file included from /home/bogl/lustre-release/libcfs/include/libcfs/libcfs.h:304,
from /home/bogl/lustre-release/libcfs/libcfs/linux/linux-tracefile.c:40:
/home/bogl/lustre-release/libcfs/include/libcfs/params_tree.h:99: error: conflicting types for ‘PDE’
/usr/src/kernels/2.6.32-431.el6.x86_64/include/linux/proc_fs.h:323: note: previous definition of ‘PDE’ was here
make[6]: *** [/home/bogl/lustre-release/libcfs/libcfs/linux/linux-tracefile.o] Error 1
make[5]: *** [/home/bogl/lustre-release/libcfs/libcfs] Error 2
make[4]: *** [/home/bogl/lustre-release/libcfs] Error 2
make[3]: *** [_module_/home/bogl/lustre-release] Error 2
make[3]: Leaving directory `/usr/src/kernels/2.6.32-431.el6.x86_64'
make[2]: *** [modules] Error 2
make[2]: Leaving directory `/home/bogl/lustre-release'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/bogl/lustre-release'
make: *** [all] Error 2
I suspect this is due to recent patch for #define HAVE_ONLY_PROCFS_SEQ 1 I believe this lead to build problems. Did some client only builds against RHEL 6.5 kernel before the recent patch and didn't have this problem. |
| Comment by James A Simmons [ 04/Dec/13 ] |
|
Try this patch - http://review.whamcloud.com/#/c/8482 |
| Comment by Bob Glossman (Inactive) [ 04/Dec/13 ] |
|
tried it in Centos 6.5. works for me. |
| Comment by Karsten Weiss [ 06/Dec/13 ] |
|
To which Lustre version does this issue apply? I was able to build Lustre client 2.5.0 on RHEL 6.5's kernel 2.6.32-431.el6 but we ran into Is there a patch to build Lustre client 2.4.x on 2.6.32-431.el6? I also don't see this issue on the issue list for Lustre 2.4.2. |
| Comment by James A Simmons [ 06/Dec/13 ] |
|
The build issue only exist for master (2.6 branch) for the RHEL6.5 build. |
| Comment by Fredrik Nyström [ 06/Dec/13 ] |
I was able to build Lustre client 2.4.x on 2.6.32-431.el6 after applying following patch. diff --git a/lustre/llite/dir.c b/lustre/llite/dir.c
index febf6ea..484d177 100644
--- a/lustre/llite/dir.c
+++ b/lustre/llite/dir.c
@@ -1228,6 +1228,30 @@ out:
RETURN(rc);
}
+static char *
+ll_getname(const char __user *filename)
+{
+ int ret = 0, len;
+ char *tmp = __getname();
+
+ if (!tmp)
+ return ERR_PTR(-ENOMEM);
+
+ len = strncpy_from_user(tmp, filename, PATH_MAX);
+ if (len == 0)
+ ret = -ENOENT;
+ else if (len > PATH_MAX)
+ ret = -ENAMETOOLONG;
+
+ if (ret) {
+ __putname(tmp);
+ tmp = ERR_PTR(ret);
+ }
+ return tmp;
+}
+
+#define ll_putname(filename) __putname(filename)
+
static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
struct inode *inode = file->f_dentry->d_inode;
@@ -1430,7 +1454,7 @@ free_lmv:
if (!(exp_connect_flags(sbi->ll_md_exp) & OBD_CONNECT_LVB_TYPE))
return -ENOTSUPP;
- filename = getname((const char *)arg);
+ filename = ll_getname((const char *)arg);
if (IS_ERR(filename))
RETURN(PTR_ERR(filename));
@@ -1441,7 +1465,7 @@ free_lmv:
rc = ll_rmdir_entry(inode, filename, namelen);
out_rmdir:
if (filename)
- putname(filename);
+ ll_putname(filename);
RETURN(rc);
}
case LL_IOC_LOV_SWAP_LAYOUTS:
@@ -1461,7 +1485,7 @@ out_rmdir:
if (cmd == IOC_MDC_GETFILEINFO ||
cmd == IOC_MDC_GETFILESTRIPE) {
- filename = getname((const char *)arg);
+ filename = ll_getname((const char *)arg);
if (IS_ERR(filename))
RETURN(PTR_ERR(filename));
@@ -1528,7 +1552,7 @@ out_rmdir:
out_req:
ptlrpc_req_finished(request);
if (filename)
- putname(filename);
+ ll_putname(filename);
return rc;
}
case IOC_LOV_GETINFO: {
Similar issues with all releases < 2.5 |
| Comment by Bob Glossman (Inactive) [ 06/Dec/13 ] |
|
I think that's http://review.whamcloud.com/5781, in master and b2_5. It is planned to be added to b2_4 before we support Centos/RHEL 6.5 there. |
| Comment by Karsten Weiss [ 06/Dec/13 ] |
|
Thanks Fredrik, with your patch it finally compiles. (I already tried a similar patch yesterday but probably made a mistake...) |
| Comment by Yang Sheng [ 11/Dec/13 ] |
|
Patch commit to: http://review.whamcloud.com/#/c/8549/ |
| Comment by Bob Glossman (Inactive) [ 12/Dec/13 ] |
|
Adding discussion about http://review.whamcloud.com/#/c/8549 here as I don't want to fill up the review in gerrit with what may become irrelevant comment. I notice you have carefully forked the ldiskfs portions of the mod so we can still build on earlier el6 as well as 6.5. However the same wasn't done for the base kernel. For example the lustre/kernel_patches/patches/raid5-mmp-unplug-dev-rhel6.patch was altered in such a way as it will now only apply onto 6.5 instead of making a new version of this patch for 6.5. No extensions to lbuild to select between 6.4 and 6.5 were done. If this strategy is OK and we are agreed to abandon patching and building earlier kernels then this is probably right. If it's not OK, then it isn't right. BTW, do the revisions for ldiskfs support here imply that some similar changes will be needed in http://review.whamcloud.com/7263 ? Just as an aside however did you find the issue needing change in the sanity.sh test? Have been into the release notes for 6.5 and didn't notice anything about it. |
| Comment by Yang Sheng [ 12/Dec/13 ] |
|
As i know, We just keep earlier version for ldiskfs patches, not base kernel patches. Yes, 3.11 will also work in this way. I'll update it. For sanity test_17g failure, Looks like RedHat bring a patch not come from upstream. I think it exist a obvious issue in function do_getname(). So we need skip it for now. |
| Comment by Christopher Morrone [ 12/Dec/13 ] |
I can explain at least the history there. The core lustre folks have historically never cared about making the transition from one kernel to the next easy. Lustre would just randomly one day stop working with your kernel and only work with some newer version, with no consideration given towards the need for a transition period where it can still compile against both kernels. In the past year I worked (with others like James) to get ldiskfs set up to support multiple versions of kernels supported at the same time. At LLNL we have the lustre tree apply the ldiskfs patches, but we maintain our own kernel independent of Lustre, so we don't let Lustre apply the kernel patches. Therefore I was not particularly motivated to look at improving the patching of the kernel. Since I was driving the ldiskfs changes, they only happened to ldiskfs. Further, since we hope to eliminate the necessity for patching one's kernel soon, it was seen as less important to make that process cleaner. ldiskfs patches, on the other hand, will exist for quite a long time. |
| Comment by Yang Sheng [ 13/Dec/13 ] |
Bugs fixed (https://bugzilla.redhat.com/): 970873 - CVE-2013-2141 Kernel: signal: information leak in tkill/tgkill |
| Comment by Bob Glossman (Inactive) [ 19/Dec/13 ] |
|
the following are needed for client builds on 6.5: in b2_4: http://review.whamcloud.com/8581 |
| Comment by Christopher Morrone [ 02/Jan/14 ] |
|
Yang Sheng, In patch http://review.whamcloud.com/8549 it is still not clear to me why you think it best to copy the ext4_ext_walk_space() function into osd_io.c. That function originates from ext4, so it would see to me that ldiskfs would be the more appropriate place to reinsert that function. If you add it to ldiskfs for just the RHEL6.5 kernel, you do not need to change all of the other kernels' patch sets. Also, I suspect that longer term the maintenance will be less difficult, because we won't need to worry about having a function in Lustre that needs to be fully compatible with multiple kernels' ext4 implementations. The function can be tweaked as needed for only the kernels that lack that function natively. |
| Comment by Yang Sheng [ 03/Jan/14 ] |
|
Hi, Christopher, I think it should be move to osd since we can use one interface for io map. Don't need consider different cases in different distro. It will reduce the maintenance effort and ldiskfs patches number. Also we can modify walk_space as needed. Anyway, I don't think this is a main issue for the patch. I would like we can make decision which interface will be used. map_blocks or walk_space. I am trying to do some test to reveal the performance different. Hope it can give some judge base. |
| Comment by Bob Glossman (Inactive) [ 06/Jan/14 ] |
|
The kernel version in 6.5 has been updated to 2.6.32-431.3.1 over the weekend. Since we haven't landed 6.5 support yet I suggest we just change our target to the new version, not submit a separate bug. |
| Comment by John DeSantis [ 16/Jan/14 ] |
|
Bob, I can confirm that the patch offered via the URL http://review.whamcloud.com/#/c/8607/ has functioned without an issue on RHEL 6.x with the new kernel. Thank you for posting that link. John DeSantis |
| Comment by Yang Sheng [ 28/Jan/14 ] |
|
I can sure that Oleg mentioned racer issue just relate to rhel6.5 self. But still not very clear why it happen. What i can provide is that mnt_count isn't release so umount cannot success forever. Other thing is 'ln' is culprit. Further investigation needed. Btw: WangDi, Your patch looks like fixes the 'mdc_read_page' error. |
| Comment by Oleg Drokin [ 31/Jan/14 ] |
|
Okm I traced the issue back to rhel 6.5 patch adding estale-retry logic. in linkat() they leak nameidata in case of ESTALE return which lustre does during racer. Patch that fixes the issue for me is: --- fs/namei.c-orig 2014-01-30 19:53:32.885946633 -0500
+++ fs/namei.c 2014-01-30 21:10:31.880946625 -0500
@@ -2897,6 +2897,7 @@ out_release:
path_put(&nd.path);
putname(to);
if (retry_estale(error, how)) {
+ path_put(&old_path);
how |= LOOKUP_REVAL;
goto retry;
}
|
| Comment by Oleg Drokin [ 31/Jan/14 ] |
|
RH ticket filed for this: https://bugzilla.redhat.com/show_bug.cgi?id=1059943 |
| Comment by Bob Glossman (Inactive) [ 31/Jan/14 ] |
|
Seems like we're stuck until the upstream fix happens. Even if we added a kernel patch for 6.5, it would only apply in server builds. We would still hit the problem in clients that we build and run on unpatched, pristine kernels. Is there some obvious workaround I'm missing? |
| Comment by Yang Sheng [ 31/Jan/14 ] |
|
Why i cannot access RH ticket? Is it need some permit? |
| Comment by Oleg Drokin [ 03/Feb/14 ] |
|
the RH ticket is restricted to some intel group for some reason I am not sure of. Anyway, the bug is also present in upstream kernels, so I also sent a fix there and it was already accepted. See here if you are interested in all the details: http://comments.gmane.org/gmane.linux.kernel/1638580 |
| Comment by Peter Jones [ 10/Feb/14 ] |
|
Landed for 2.6 |