[LU-254] e2fsprogs breaks building of initrd on RHEL6 Created: 29/Apr/11 Updated: 26/Oct/11 Resolved: 19/May/11 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Richard Henwood (Inactive) | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Story Points: | 2 |
| Severity: | 3 |
| Rank (Obsolete): | 8537 |
| Description |
|
The RHEL6 way to build initrd is with dracut. If the Lustre e2fsprogs is installed, dracut builds an initrd that won't boot. This issue was discovered by John Hammond at TACC. Lustre e2fsprogs overwrites blkid, and this maybe the culprit. To reproduce: 1. vanilla RHEL6. |
| Comments |
| Comment by John Hammond [ 29/Apr/11 ] |
|
Note that force reinstalling the RHEL 6 e2fsprogs (after following 1 through 6) and rerunning dracut will result in a bootable image. |
| Comment by Andreas Dilger [ 30/Apr/11 ] |
|
It is odd that the blkid program should be replaced, since this RPM is using the same .spec file that RHEL6 uses. The .spec runs configure with --disable-blkid, and I don't see it building or installing anything related to blkid. On the other hand, I see it is running configure with --disable-e2initrd-helper, which seems like it might at least be related, or could just be a red herring. One way to investigate this is to unpack the initrd image files and compare the working one to the non-working one. The image files are gzipped cpio archives and can be extracted for examination. Presumably this has nothing to do with the Lustre kernel, and can ne reproduced for a stock RHEL kernel with just the updated e2fsprogs? Can you please report any files that differ between the images? Also, what files from e2fsprogs are installed into the initrd? I could only possibly imagine e2fsck to check the root fs before mounting it, but I'm not even sure about that. |
| Comment by Richard Henwood (Inactive) [ 01/May/11 ] |
|
I've extracted the img's with: gunzip < /boot/initrd.img | cpio -i --make-directories and then performed a quick diff: [root@localhost img]# diff -ri working/ fail/ Only in working/etc/ld.so.conf.d: kernel-2.6.32-71.el6.x86_64.conf I copied over kernel-2.6.32-71.el6.x86_64.conf, rebuilt the initrd and it booted. |
| Comment by Andreas Dilger [ 05/May/11 ] |
|
I noticed a problem with the updated e2fsprogs today on my FC13 test system. It was checking a snapshot of the /usr/src filesystem in the background and it marked all of the long symlinks as bad and removed them. It turns out that the ext4 filesystem creates all symlinks with the EXT4_EXTENTS_FL flag enabled, which we've always considered as corrupt for Lustre because extents are only enabled on OSTs, and symlinks are only ever created on the MDT. I'm not sure if this is the root cause of your problem (perhaps there was a long symlink referencing the kernel file that was deleted)? In any case, it should be fixed before using the lustre e2fsprogs on RHEL6 systems. The patch has been pushed to http://review.whamcloud.com/503 for testing. |
| Comment by Andreas Dilger [ 09/May/11 ] |
|
Richard, John, I've committed a fix via http://review.whamcloud.com/503 which may fix this problem. It fixes a problem with long symlinks in our e2fsprogs that would only appear for RHEL6-based installations using ext4 for the base filesystems. I'm not sure if that is the root of the problem, but I couldn't find any other problems related to e2fsprogs on RHEL6 that might cause problems with initrd image generation. Could you please give the new packages a try? |
| Comment by Build Master (Inactive) [ 09/May/11 ] |
|
Integrated in Andreas Dilger : 30b1d6a5221f2cfbafe1701cb91821349b0bffd0
|
| Comment by Build Master (Inactive) [ 09/May/11 ] |
|
Integrated in Andreas Dilger : 30b1d6a5221f2cfbafe1701cb91821349b0bffd0
|
| Comment by Richard Henwood (Inactive) [ 09/May/11 ] |
|
dracut now works for me - I'll ask John tomorrow if he also has success with this release. |
| Comment by Andreas Dilger [ 16/May/11 ] |
|
John, have you had any chance to test if this change fixes the problem that you were seeing? You should be able to use the e2fsprogs packages built to fix this problem without any need to force installation or have any packaging conflicts. They can be downloaded for x86_64 EL6 from: http://newbuild.whamcloud.com/job/e2fsprogs-master/arch=x86_64,distro=el6/23/ |
| Comment by John Hammond [ 19/May/11 ] |
|
Success! After the following steps, I no longer experienced the "No root device found" issue in the ram disk's init. 1) Reinstall util-linux-ng from the rhel6 ISO. |
| Comment by Andreas Dilger [ 19/May/11 ] |
|
Patch has landed to e2fsprogs-1.41.90.wc1 |