[LU-5307] e2fsprogs build fails in Centos 7 Created: 08/Jul/14  Updated: 02/Mar/15  Resolved: 28/Oct/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Bob Glossman (Inactive) Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Centos 7


Issue Links:
Related
is related to LU-5022 support for 3.10 rhel7 linux kernel Resolved
Severity: 3
Rank (Obsolete): 14825

 Description   

e2fsprogs build of master-lustre branch fails in el7/Centos 7. Even after faking out the build to use the RHEL6 spec file with the following patch

--- a/contrib/build-rpm
+++ b/contrib/build-rpm
@@ -69,6 +69,7 @@ fi
 case "$DISTRO-$RELEASE" in
     RedHatEnterpriseServer-6*) DISTRO=RHEL; RELEASE=6;;
     CentOS-6*) DISTRO=RHEL; RELEASE=6;;
+    CentOS-7*) DISTRO=RHEL; RELEASE=6;;
     Fedora-1[234]) DISTRO=RHEL; RELEASE=6;;    # use the same .spec for now
 esac

the build fails. Plain 'make' works fine, but 'make rpm' fails a couple of the test cases executed on the fly in the build. example errors:

  .
  .
  .
f_uninit_disable: disable uninit_bg feature: ok
f_bbfile: bad blocks in files: ok
168 tests succeeded	2 tests failed
Tests failed: r_64bit_big_expand r_ext4_big_expand 
make[2]: *** [test_post] Error 1
make[2]: Leaving directory `/home/bogl/rb/BUILD/e2fsprogs-1.42.9.wc1/tests'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory `/home/bogl/rb/BUILD/e2fsprogs-1.42.9.wc1'
error: Bad exit status from /var/tmp/rpm-tmp.ZDx19k (%check)


RPM build errors:
    bogus date in %changelog: Mon Jul 13 2010 Eric Sandeen <sandeen@redhat.com> 1.41.12-5
    bogus date in %changelog: Thu Sep 14 2009 Eric Sandeen <sandeen@redhat.com> 1.41.9-3
    bogus date in %changelog: Fri Aug 05 2009 Eric Sandeen <sandeen@redhat.com> 1.41.8-6
    bogus date in %changelog: Mon Oct 03 2008 Eric Sandeen <sandeen@redhat.com> 1.41.3-2
    bogus date in %changelog: Mon Oct 03 2008 Eric Sandeen <sandeen@redhat.com> 1.41.3-1
    bogus date in %changelog: Mon Jan 10 2008 Eric Sandeen <sandeen@redhat.com> 1.40.4-4
    bogus date in %changelog: Tue Jan 09 2008 Eric Sandeen <sandeen@redhat.com> 1.40.4-3
    bogus date in %changelog: Tue Oct 15 2007 Eric Sandeen <esandeen@redhat.com> 1.40.2-9
    Bad exit status from /var/tmp/rpm-tmp.ZDx19k (%check)
make: *** [rpm] Error 1

Not a blocker for right now when going for only client builds, but will become critical later on. Working e2fsprogs will be needed for full server functionality.



 Comments   
Comment by Andreas Dilger [ 09/Jul/14 ]

I think I already have this fixed in my tree. Was just waiting for upstream e2fsprogs to be updated to 1.42.11 before pushing a new release.

Comment by Bob Glossman (Inactive) [ 23/Jul/14 ]

while most recent refreshes of http://review.whamcloud.com/11147 build fine in el7 on Jenkins it fails 2 of the runtime tests during 'make rpm' in my manual builds on el7. failing tests are r_64bit_big_expand & r_ext4_big_expand. Not at all clear what is different about the builds.

r_64bit_big_expand.failed:

very large fs growth using ext4 w/64bit starting
0+0 records in
0+0 records out
0 bytes (0 B) copied, 3.6279e-05 s, 0.0 kB/s
using /tmp/e2fsprogs-tmp-r_64bit_big_expand.mn44uq
../misc/mke2fs -t ext4 -O 64bit -qF /tmp/e2fsprogs-tmp-r_64bit_big_expand.mn44uq 512M
../tests/progs/crcsum /tmp/csum-tmp.JgasXw
Checksum is 445001071
Setting up file system
debugfs 1.42.11.wc1 (24-Jul-2014)
debugfs:  mkdir test
debugfs:  cd test
debugfs:  write /tmp/csum-tmp.JgasXw e2fsck
Allocated inode: 13
debugfs:  ls /test
 12  (12) .    2  (12) ..    13  (4072) e2fsck   
debugfs:  stat /test/e2fsck
Inode: 13   Type: regular    Mode:  0600   Flags: 0x80000
Generation: 0    Version: 0x00000000:00000000
User:     0   Group:     0   Size: 1222732
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 2392
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x53cfcbe9:00000000 -- Wed Jul 23 14:51:21 2014
 atime: 0x53cfcbe9:00000000 -- Wed Jul 23 14:51:21 2014
 mtime: 0x53cfcbe9:00000000 -- Wed Jul 23 14:51:21 2014
crtime: 0x53cfcbe9:00000000 -- Wed Jul 23 14:51:21 2014
Size of extra inode fields: 28
EXTENTS:
(0-5):75-80, (6-17):85-96, (18-298):2146-2426
debugfs:  quit
 
../e2fsck/e2fsck -fy /tmp/e2fsprogs-tmp-r_64bit_big_expand.mn44uq
e2fsck 1.42.11.wc1 (24-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (17179993603, counted=124419).
Fix? yes


/tmp/e2fsprogs-tmp-r_64bit_big_expand.mn44uq: ***** FILE SYSTEM WAS MODIFIED *****
/tmp/e2fsprogs-tmp-r_64bit_big_expand.mn44uq: 13/32768 files (7.7% non-contiguous), 6653/131072 blocks
../resize/resize2fs -d 63 /tmp/e2fsprogs-tmp-r_64bit_big_expand.mn44uq 2T
resize2fs 1.42.11.wc1 (24-Jul-2014)
fs has 13 inodes, 1 groups required.
fs requires 4402 data blocks.
With 1 group(s), we have 30517 blocks available.
Last group's overhead is 2251
Need 4402 data blocks in last group
Final size of last group is 6653
Estimated blocks needed: 6653
Extents safety margin: 13
Resizing the filesystem on /tmp/e2fsprogs-tmp-r_64bit_big_expand.mn44uq to 536870912 (4k) blocks.
read_bitmaps: Memory used: 132k/0k (64k/69k), time:  0.00/ 0.00/ 0.00
read_bitmaps: I/O read: 1MB, write: 0MB, rate: 22727.27MB/s
fix_uninit_block_bitmaps 1: Memory used: 132k/0k (64k/69k), time:  0.00/ 0.00/ 0.00
../resize/resize2fs: Attempt to write block to filesystem resulted in short write while trying to resize /tmp/e2fsprogs-tmp-r_64bit_big_expand.mn44uq
Please run 'e2fsck -fy /tmp/e2fsprogs-tmp-r_64bit_big_expand.mn44uq' to fix the filesystem
after the aborted resize operation.

r_ext4_big_expand.failed:

very large fs growth using ext4 starting
0+0 records in
0+0 records out
0 bytes (0 B) copied, 3.7817e-05 s, 0.0 kB/s
using /tmp/e2fsprogs-tmp-r_ext4_big_expand.3DRSoy
../misc/mke2fs -t ext4 -qF /tmp/e2fsprogs-tmp-r_ext4_big_expand.3DRSoy 512M
../tests/progs/crcsum /tmp/csum-tmp.7VPqSJ
Checksum is 2560589454
Setting up file system
debugfs 1.42.11.wc1 (24-Jul-2014)
debugfs:  mkdir test
debugfs:  cd test
debugfs:  write /tmp/csum-tmp.7VPqSJ e2fsck
Allocated inode: 13
debugfs:  ls /test
 12  (12) .    2  (12) ..    13  (4072) e2fsck   
debugfs:  stat /test/e2fsck
Inode: 13   Type: regular    Mode:  0600   Flags: 0x80000
Generation: 0    Version: 0x00000000:00000000
User:     0   Group:     0   Size: 1222732
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 2392
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x53cfcb9e:00000000 -- Wed Jul 23 14:50:06 2014
 atime: 0x53cfcb9e:00000000 -- Wed Jul 23 14:50:06 2014
 mtime: 0x53cfcb9e:00000000 -- Wed Jul 23 14:50:06 2014
crtime: 0x53cfcb9e:00000000 -- Wed Jul 23 14:50:06 2014
Size of extra inode fields: 28
EXTENTS:
(0-5):43-48, (6-17):53-64, (18-298):2114-2394
debugfs:  quit
 
../e2fsck/e2fsck -fy /tmp/e2fsprogs-tmp-r_ext4_big_expand.3DRSoy
e2fsck 1.42.11.wc1 (24-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/tmp/e2fsprogs-tmp-r_ext4_big_expand.3DRSoy: 13/32768 files (7.7% non-contiguous), 6557/131072 blocks
../resize/resize2fs -d 63 /tmp/e2fsprogs-tmp-r_ext4_big_expand.3DRSoy 2T
resize2fs 1.42.11.wc1 (24-Jul-2014)
fs has 13 inodes, 1 groups required.
fs requires 4402 data blocks.
With 1 group(s), we have 30613 blocks available.
Last group's overhead is 2155
Need 4402 data blocks in last group
Final size of last group is 6557
Estimated blocks needed: 6557
Extents safety margin: 13
Resizing the filesystem on /tmp/e2fsprogs-tmp-r_ext4_big_expand.3DRSoy to 536870912 (4k) blocks.
read_bitmaps: Memory used: 132k/0k (64k/69k), time:  0.00/ 0.00/ 0.00
read_bitmaps: I/O read: 1MB, write: 0MB, rate: 27027.03MB/s
fix_uninit_block_bitmaps 1: Memory used: 132k/0k (64k/69k), time:  0.00/ 0.00/ 0.00
../resize/resize2fs: Attempt to write block to filesystem resulted in short write while trying to resize /tmp/e2fsprogs-tmp-r_ext4_big_expand.3DRSoy
Please run 'e2fsck -fy /tmp/e2fsprogs-tmp-r_ext4_big_expand.3DRSoy' to fix the filesystem
after the aborted resize operation.
Comment by Andreas Dilger [ 30/Jul/14 ]

It turns out that there was a bug in the upstream e2fsprogs-1.42.11 release which caused small filesystems to fail at mkfs time, especially if they specified a high inode count or large inodes (which both our sanity-lfsck and conf-sanity tests do). The good news is that I've found and fixed this (http://git.whamcloud.com/tools/e2fsprogs.git/commit/adf996d0f1afd574c9cc0782a95d0995e0664abd) and pushed it upstream, so it will be in the upcoming 1.42.12 release and our own 1.42.11.wc2 release.

The e2fsprogs-RHEL-7.spec.in file does not build the lfsck tool since it is now wholly obsolete and there are no versions of RHEL 7 where the old version of lfsck was available. For distros where the e2fsprogs-based lfsck is still being built, it will also refuse to run on Lustre 2.6 and later, or if it detects the in-kernel 2.6 LFSCK has run MDT-OST consistency (checking for the presence of the lfsck-layout file).

I also had to revert some autoconf changes so that e2fsprogs could still build on RHEL5 (http://git.whamcloud.com/tools/e2fsprogs.git/commit/5ce5311d609967a903d3b4cdcbf6f66e939e2d7f) but the change does not have any affect on functionality either way.

The r_{64bit,ext4}_big_expand test failures are hopefully addressed by other patches landed since e2fsprogs-1.42.11 was tagged.

Comment by Bob Glossman (Inactive) [ 27/Aug/14 ]

The r_{64bit,ext4}_big_expand test failures continued and persisted even with the latest changes. However it looks like it was only due to the environment I was running tests on. The default fs type in an el7 install is xfs, not ext2/3/4. Building e2fsprogs & running tests on a root fs that is xfs failed every time. When I finally figured out how to install el7 with a non-default ext4 fs type for root fs, then e2fsprogs built fine. All the runtime tests passed.

Since Jenkins build of e2fsprogs have been succeeding right along I'm guessing that el7 builders in our clusters were always installed with ext4, not xfs, and that's why it worked there.

Generated at Sat Feb 10 01:50:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.