[LU-11246] New lustre e2fsprogs 1.44 issues Created: 14/Aug/18 Updated: 17/Oct/18 Resolved: 17/Oct/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | James A Simmons | Assignee: | Dongyang Li |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Building and use latest modifies e2fsprogs for lustre. |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||
| Description |
|
With the need to build e2fsprogs for ARM and Power8 I also attempted to build it for my general RHEL7 systems. I discovered a few things. First I think we can eliminate the extra spec files. If you install lsb_release for your distro it will build with the default spec file. A yum install redhat-release will do it. On my build machine I have libfuse-devel installed which broke the e2fsprogs. Even a ./configure --disable-fuse2fs didn't help. Lastly for my new target testbed the disk are 75TB in size so when I attempted to build a file system I saw the following error: mke2fs 1.44.3.wc1 (23-July-2018) mke2fs: Size of device (0x448000000 blocks) /dev/mapper/crius-ddn-l12 too big to be expressed in 32 bits using a blocksize of 4096. It looks like a -o 64 option will be needed. |
| Comments |
| Comment by Andreas Dilger [ 14/Aug/18 ] |
Firstly, I suspect that the yum install will not work for SLES, and it isn't clear if it would work for RHEL6? How does it know to get the right .spec file for the distro? Also, I suspect the vendor .spec file will not have the correct contact URLs and such, so it isn't clear to me how that would work. I'm not against reducing our maintenance overhead by eliminating patches, but you'd need to explain a bit more about what this is doing before I can understand whether it is actually something we want to change.
lidongyang had a patch to add a configure check for libfuse-devel to skip this if it wasn't available. Maybe that is missing from the version that you were using? Or is the problem that this configure check is missing something and it still fails to build when libfuse-devel is not installed?
I think we already add "-O 64bit" from mkfs.lustre by default for OSTs. Are you running mke2fs directly on the devices? static int enable_default_ext4_features() { /* Enable large block addresses if the LUN is over 2^32 blocks. */ if ((mop->mo_device_kb / (L_BLOCK_SIZE >> 10) > UINT32_MAX) && is_e2fsprogs_feature_supp("-O 64bit") == 0) enable_64bit = 1; This has been the default even for MDTs since commit v2_10_58_0-123-geb65c3a patch https://review.whamcloud.com/31037 " |
| Comment by Dongyang Li [ 15/Aug/18 ] |
|
We do need the specs for the platforms to overcome the differences, e.g. several test case will always fail on a default SLES11 buildbox because it picks ext3 as the root filesystem. we need to apply extra patches and skip some tests for SLES only. When I first made the specs I put fuse-devel as a build dependency, but 1. fuse2fs is not useful to us, it can not mount fs with ea_inode 2. the build boxes from jenkins doesn't have fuse-devel by default so I've removed them from the specs, however the --disable-fuse2fs should go into the configure options in the specs. I've just fixed that in the git repo. |
| Comment by James A Simmons [ 15/Aug/18 ] |
|
Which repo did you push the fuse to? I'm working off the master-lustre-test branch. @Andreas: I'm building the file system with mkfs.lustre using the tip of pre-2.12 |
| Comment by Dongyang Li [ 16/Aug/18 ] |
|
James, it's the same master-lustre-test branch, do a git pull or clone you should be able to see it. |
| Comment by James A Simmons [ 17/Aug/18 ] |
|
Last commit I see is "e2fsprogs: fix compile error and warnings for old gcc versions". Which commit is it? |
| Comment by Dongyang Li [ 17/Aug/18 ] |
|
I was overwriting the commits, we want to keep the number of patches on top of e2fsprogs upstream to the minimum. One way to check is to look into the spec.in file, see if you have --disable-fuse2fs in the configure section. |
| Comment by James A Simmons [ 21/Aug/18 ] |
|
I managed to get my Power8 nodes running the lustre server code but using the latest e2fsprogs I'm seeing the following: [ 1049.122763] Lustre: DEBUG MARKER: mkfs.lustre --mgsnode=10.37.202.6@o2ib1 --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=250000 --reformat /dev/nvme0n1 [ 1049.486023] LDISKFS-fs (nvme0n1): Couldn't mount because of unsupported optional features (400) Was something missed? |
| Comment by Dongyang Li [ 22/Aug/18 ] |
|
400 is EXT4_FEATURE_INCOMPAT_EA_INODE, which should be there in ldiskfs what does the mkfs.lustre output say? like the options it passed to mke2fs? also what does dumpe2fs -h say? |
| Comment by James A Simmons [ 06/Sep/18 ] |
|
mkfs.lustre --mgsnode=172.30.224.8@tcp --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=md t.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=250000 -vvv --reformat /dev/nvme0n1
Permanent disk data: Target: lustre:MDT0000 Index: 0 Lustre FS: lustre Mount type: ldiskfs Flags: 0x61 (MDT first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: mgsnode=172.30.224.8@tcp sys.timeout=20 mdt.identity_upcall=/usr/sbin/l_getidentity
device size = 763097MB formatting backing filesystem ldiskfs on /dev/nvme0n1 target name lustre:MDT0000 4k blocks 62500 options -I 1024 -i 2560 -O dirdata,uninit_bg,^extents,quota,huge_file,ea_inode,flex_bg -E lazy_journal_init -F mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0000 -I 1024 -i 2560 -O dirdata,uninit_bg,^extents,quota,huge_file,ea_inode,flex_bg -E lazy_journal_init -F /dev/nvme0n1 62500 cmd: mke2fs -j -b 4096 -L lustre:MDT0000 -I 1024 -i 2560 -O dirdata,uninit_bg,^extents,quota,huge_file,ea_inode,flex_bg -E lazy_journal_init -F /dev/nvme0n1 62500 mke2fs 1.44.3.wc1 (23-July-2018) /dev/nvme0n1 contains a ext4 file system labelled 'lustre:MDT0000' created on Wed Sep 5 21:17:25 2018 Discarding device blocks: done Creating filesystem with 62500 4k blocks and 100000 inodes Filesystem UUID: 509625e9-8319-4f61-87cf-bd60c2e2771f ....
Filesystem volume name: lustre:MDT0000 Last mounted on: <not available> Filesystem UUID: 509625e9-8319-4f61-87cf-bd60c2e2771f Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype flex_bg ea_inode dirdata sparse_super large_file huge_file uninit_bg quota Filesystem flags: unsigned_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux
Now if I run mke2fs -j -b 4096 -L lustre:MDT0000 -I 1024 -i 2560 -O dirdata,uninit_bg,^extents,quota,huge_file,ea_inode,flex_bg -E lazy_journal_init -F /dev/nvme0n1 62500 that works |
| Comment by James A Simmons [ 06/Sep/18 ] |
|
Also in the RHEL7 alt kernel I do see in ext4.h ext4.h:#define EXT4_FEATURE_INCOMPAT_EA_INOD*E 0x0400 / EA in inode */ |
| Comment by Andreas Dilger [ 06/Sep/18 ] |
|
Sorry, it isn't clear in your next most recent comment what the problem is? It looks to me like the mkfs.lustre command was running OK. As for the feature flag definition in RHEL 7.3, that was just reserved for future use in that kernel, it doesn't mean the feature is actually supported by ext4. It would need to be in the EXT4_FEATURE_INCOMPAT_SUPP mask for the kernel allow it to mount. It seems like you don't have ext4-large-eas.patch applied. |
| Comment by James A Simmons [ 06/Sep/18 ] |
|
You are right. Looking at the RHEL7 alt kernel most of the ea_inode landed but not all. Only bits and pieces |
| Comment by James A Simmons [ 06/Sep/18 ] |
|
Okay I found the problem. For the RHLE7 alt kernel 4.14.0-49.6.1.el7a.ppc64le all the large eas work landed except for adding EXT4_FEATURE_INCOMPAT_EA_INODE to EXT4_FEATURE_INCOMPAT_SUPP. So it was a kernel bug. Have to talk to Oleg to see if the ARM kernel he is using has this bug as well.
|
| Comment by James A Simmons [ 27/Sep/18 ] |
|
The only problem I have seen with e2fsprogs 1.44 is the LUN is to large issue. Let me see if I can reproduce and track down the problem. |
| Comment by Andreas Dilger [ 15/Oct/18 ] |
|
Janes. We've made the e2fsprogs-1.44.3-wc1 release - builds available under https://build.whamcloud.com/job/e2fsprogs-master/ but not pushed to the download site while they undergo final testing. Could you please explain what the "LUN atop large" issue is? It isn't referenced anywhere else in this ticket. If you aren't having any more problems I'd like to close this, as it is mostly a duplicate of |
| Comment by James A Simmons [ 15/Oct/18 ] |
|
Still need to look into the to large LUN error. |