[LU-11246] New lustre e2fsprogs 1.44 issues Created: 14/Aug/18  Updated: 17/Oct/18  Resolved: 17/Oct/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: James A Simmons Assignee: Dongyang Li
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Building and use latest modifies e2fsprogs for lustre.


Issue Links:
Blocker
is blocked by LU-11268 mdc_intent_getxattr_pack() allocates ... Resolved
Related
is related to LU-1732 enable wide striping by default Resolved
is related to LU-6387 Add Power8 support to Lustre Resolved
is related to LU-11200 Centos 8 arm64 server support Resolved
is related to LU-11215 conf-sanity test_61: Invalid filesyst... Resolved
is related to LU-10997 Ubuntu 18 support Resolved
is related to LU-11440 Make e2fsprogs-1.44.3-wc1 release Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

With the need to build e2fsprogs for ARM and Power8 I also attempted to build it for my general RHEL7 systems. I discovered a few things. First I think we can eliminate the extra spec files. If you install lsb_release for your distro it will build with the default spec file. A yum install redhat-release will do it. On my build machine I have libfuse-devel installed which broke the e2fsprogs. Even a ./configure --disable-fuse2fs didn't help. Lastly for my new target testbed the disk are 75TB in size so when I attempted to build a file system I saw the following error: 

mke2fs 1.44.3.wc1 (23-July-2018)

mke2fs: Size of device (0x448000000 blocks) /dev/mapper/crius-ddn-l12 too big to be expressed in 32 bits using a blocksize of 4096.

It looks like a -o 64 option will be needed.



 Comments   
Comment by Andreas Dilger [ 14/Aug/18 ]

With the need to build e2fsprogs for ARM and Power8 I also attempted to build it for my general RHEL7 systems. I discovered a few things. First I think we can eliminate the extra spec files. If you install lsb_release for your distro it will build with the default spec file. A yum install redhat-release will do it.

Firstly, I suspect that the yum install will not work for SLES, and it isn't clear if it would work for RHEL6? How does it know to get the right .spec file for the distro? Also, I suspect the vendor .spec file will not have the correct contact URLs and such, so it isn't clear to me how that would work. I'm not against reducing our maintenance overhead by eliminating patches, but you'd need to explain a bit more about what this is doing before I can understand whether it is actually something we want to change.

On my build machine I have libfuse-devel installed which broke the e2fsprogs. Even a ./configure --disable-fuse2fs didn't help.

lidongyang had a patch to add a configure check for libfuse-devel to skip this if it wasn't available. Maybe that is missing from the version that you were using? Or is the problem that this configure check is missing something and it still fails to build when libfuse-devel is not installed?

Lastly for my new target testbed the disk are 75TB in size so when I attempted to build a file system I saw the following error:

mke2fs 1.44.3.wc1 (23-July-2018)

mke2fs: Size of device (0x448000000 blocks) /dev/mapper/crius-ddn-l12 too big to be expressed in 32 bits using a blocksize of 4096.

It looks like a -o 64 option will be needed.

I think we already add "-O 64bit" from mkfs.lustre by default for OSTs. Are you running mke2fs directly on the devices?

static int enable_default_ext4_features()
{
        /* Enable large block addresses if the LUN is over 2^32 blocks. */
        if ((mop->mo_device_kb / (L_BLOCK_SIZE >> 10) > UINT32_MAX) &&
             is_e2fsprogs_feature_supp("-O 64bit") == 0)
                enable_64bit = 1;

This has been the default even for MDTs since commit v2_10_58_0-123-geb65c3a patch https://review.whamcloud.com/31037 "LU-10520 mkfs: enable extents for big MDT" because even though large MDTs were uncommon in the past, with DoM it is more likely that we will need large MDT filesystems. With a 75TB MDT holding 1B inodes, that is only about 64KB/file once the other filesystem overhead is taken into account.

Comment by Dongyang Li [ 15/Aug/18 ]

We do need the specs for the platforms to overcome the differences,

e.g. several test case will always fail on a default SLES11 buildbox because it picks ext3 as the root filesystem. we need to apply extra patches and skip some tests for SLES only.

When I first made the specs I put fuse-devel as a build dependency, but 

1. fuse2fs is not useful to us, it can not mount fs with ea_inode

2. the build boxes from jenkins doesn't have fuse-devel by default

so I've removed them from the specs, however the --disable-fuse2fs should go into the configure options in the specs. I've just fixed that in the git repo.

Comment by James A Simmons [ 15/Aug/18 ]

Which repo did you push the fuse to?  I'm working off the master-lustre-test branch. 

@Andreas: I'm building the file system with mkfs.lustre using the tip of pre-2.12

Comment by Dongyang Li [ 16/Aug/18 ]

James, it's the same master-lustre-test branch, do a git pull or clone you should be able to see it.

Comment by James A Simmons [ 17/Aug/18 ]

Last commit I see is "e2fsprogs: fix compile error and warnings for old gcc versions". Which commit is it?

Comment by Dongyang Li [ 17/Aug/18 ]

I was overwriting the commits, we want to keep the number of patches on top of e2fsprogs upstream to the minimum.

One way to check is to look into the spec.in file, see if you have --disable-fuse2fs in the configure section.

Comment by James A Simmons [ 21/Aug/18 ]

I managed to get my Power8 nodes running the lustre server code but using the latest e2fsprogs I'm seeing the following:

[ 1049.122763] Lustre: DEBUG MARKER: mkfs.lustre --mgsnode=10.37.202.6@o2ib1 --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=250000 --reformat /dev/nvme0n1

[ 1049.486023] LDISKFS-fs (nvme0n1): Couldn't mount because of unsupported optional features (400)

Was something missed?

Comment by Dongyang Li [ 22/Aug/18 ]

400 is EXT4_FEATURE_INCOMPAT_EA_INODE, which should be there in ldiskfs

what does the mkfs.lustre output say? like the options it passed to mke2fs?

also what does dumpe2fs -h say?

Comment by James A Simmons [ 06/Sep/18 ]

mkfs.lustre --mgsnode=172.30.224.8@tcp --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=md

t.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=250000 -vvv --reformat /dev/nvme0n1

 

   Permanent disk data:   

Target:     lustre:MDT0000

Index:      0

Lustre FS:  lustre

Mount type: ldiskfs

Flags:      0x61

              (MDT first_time update )

Persistent mount opts: user_xattr,errors=remount-ro

Parameters: mgsnode=172.30.224.8@tcp sys.timeout=20 mdt.identity_upcall=/usr/sbin/l_getidentity

 

device size = 763097MB

formatting backing filesystem ldiskfs on /dev/nvme0n1

        target name   lustre:MDT0000

        4k blocks     62500

        options        -I 1024 -i 2560 -O dirdata,uninit_bg,^extents,quota,huge_file,ea_inode,flex_bg -E lazy_journal_init -F

mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0000  -I 1024 -i 2560 -O dirdata,uninit_bg,^extents,quota,huge_file,ea_inode,flex_bg -E lazy_journal_init -F /dev/nvme0n1 62500

cmd: mke2fs -j -b 4096 -L lustre:MDT0000  -I 1024 -i 2560 -O dirdata,uninit_bg,^extents,quota,huge_file,ea_inode,flex_bg -E lazy_journal_init -F /dev/nvme0n1 62500

mke2fs 1.44.3.wc1 (23-July-2018)

/dev/nvme0n1 contains a ext4 file system labelled 'lustre:MDT0000'

        created on Wed Sep  5 21:17:25 2018

Discarding device blocks: done                            

Creating filesystem with 62500 4k blocks and 100000 inodes

Filesystem UUID: 509625e9-8319-4f61-87cf-bd60c2e2771f

....

 

Filesystem volume name:   lustre:MDT0000

Last mounted on:          <not available>

Filesystem UUID:          509625e9-8319-4f61-87cf-bd60c2e2771f

Filesystem magic number:  0xEF53

Filesystem revision #:    1 (dynamic)

Filesystem features:      has_journal ext_attr resize_inode dir_index filetype flex_bg ea_inode dirdata sparse_super large_file huge_file uninit_bg quota

Filesystem flags:         unsigned_directory_hash

Default mount options:    user_xattr acl

Filesystem state:         clean

Errors behavior:          Continue

Filesystem OS type:       Linux

 

Now if I run 

mke2fs -j -b 4096 -L lustre:MDT0000  -I 1024 -i 2560 -O dirdata,uninit_bg,^extents,quota,huge_file,ea_inode,flex_bg -E lazy_journal_init -F /dev/nvme0n1 62500

that works

Comment by James A Simmons [ 06/Sep/18 ]

Also in the RHEL7 alt kernel I do see in ext4.h 

ext4.h:#define EXT4_FEATURE_INCOMPAT_EA_INOD*E           0x0400 / EA in inode */

Comment by Andreas Dilger [ 06/Sep/18 ]

Sorry, it isn't clear in your next most recent comment what the problem is? It looks to me like the mkfs.lustre command was running OK.

As for the feature flag definition in RHEL 7.3, that was just reserved for future use in that kernel, it doesn't mean the feature is actually supported by ext4. It would need to be in the EXT4_FEATURE_INCOMPAT_SUPP mask for the kernel allow it to mount.

It seems like you don't have ext4-large-eas.patch applied.

Comment by James A Simmons [ 06/Sep/18 ]

You are right. Looking at the RHEL7 alt kernel most of the ea_inode landed but not all. Only bits and pieces

Comment by James A Simmons [ 06/Sep/18 ]

Okay I found the problem. For the RHLE7 alt kernel 4.14.0-49.6.1.el7a.ppc64le all the large eas work landed except for adding EXT4_FEATURE_INCOMPAT_EA_INODE to EXT4_FEATURE_INCOMPAT_SUPP. So it was a kernel bug. Have to talk to Oleg to see if the ARM kernel he is using has this bug as well.

 

Comment by James A Simmons [ 27/Sep/18 ]

The only problem I have seen with e2fsprogs 1.44 is the LUN is to large issue. Let me see if I can reproduce and track down the problem.

Comment by Andreas Dilger [ 15/Oct/18 ]

Janes. We've made the e2fsprogs-1.44.3-wc1 release - builds available under https://build.whamcloud.com/job/e2fsprogs-master/ but not pushed to the download site while they undergo final testing.

Could you please explain what the "LUN atop large" issue is? It isn't referenced anywhere else in this ticket.

If you aren't having any more problems I'd like to close this, as it is mostly a duplicate of LU-11440.

Comment by James A Simmons [ 15/Oct/18 ]

Still need to look into the to large LUN error.

Generated at Sat Feb 10 02:42:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.