[LU-255] use ext4 features by default for newly formatted filesystems - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.1.0
Affects Version/s: Lustre 2.1.0
Labels:
None

Rank (Obsolete):
5035

Description

There are a number of ext4 features that we should be enabling by default for newly-formatted ldiskfs filesystems. In particular, the flex_bg option is important for reducing e2fsck time as well as avoiding "slow first write" issues that have hit a number of customers with fuller OSTs. Using flex_bg would avoid 10-minute delay at mount time or for each e2fsck run. As well, it would be useful to also enable other features like huge_file (files > 2TB) and dir_nlink (> 65000 subdirectories) by default.

All of these features are enabled by default if we format the filesystem with the option "-t ext4". Alternately, we could enable these individually in enable_default_backfs_features().

See http://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_fujita.pdf for a summary of improvements. While we won't see the 12h e2fsck -> 5 minute e2fsck improvement shown there (we already use extents and uninit_bg), the flex_bg feature is definitely still a win.

Attachments

Activity

[LU-255] use ext4 features by default for newly formatted filesystems

Shuichi Ihara (Inactive) added a comment - 15/May/11 7:50 AM

Formatting MDT also worked, when I added --mkfsoptions="-i 4096" to mkfs.lustre...

Shuichi Ihara (Inactive) added a comment - 15/May/11 7:50 AM Formatting MDT also worked, when I added --mkfsoptions="-i 4096" to mkfs.lustre...

Shuichi Ihara (Inactive) added a comment - 15/May/11 7:24 AM

I'm also interested in this patches and just tested patched RPMs. When I formatted the MDT (16TB), it failed due to the following errors. Any advises? OST format worked well.

mkfs.lustre --verbose --reformat --mgs --mdt /dev/mpath/mdt

Permanent disk data:
Target: lustre-MDTffff
Index: unassigned
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x75
(MDT MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

device size = 14934016MB
formatting backing filesystem ldiskfs on /dev/mpath/mdt
target name lustre-MDTffff
4k blocks 3823108096
options -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F /dev/mpath/mdt 3823108096
cmd: mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F /dev/mpath/mdt 3823108096
mke2fs 1.41.12.2.ora1 (14-Aug-2010)
mke2fs: too many inodes (7646216192), raise inode ratio?

mkfs.lustre FATAL: Unable to build fs /dev/mpath/mdt (256)

mkfs.lustre FATAL: mkfs failed 256

Shuichi Ihara (Inactive) added a comment - 15/May/11 7:24 AM I'm also interested in this patches and just tested patched RPMs. When I formatted the MDT (16TB), it failed due to the following errors. Any advises? OST format worked well. mkfs.lustre --verbose --reformat --mgs --mdt /dev/mpath/mdt Permanent disk data: Target: lustre-MDTffff Index: unassigned Lustre FS: lustre Mount type: ldiskfs Flags: 0x75 (MDT MGS needs_index first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: device size = 14934016MB formatting backing filesystem ldiskfs on /dev/mpath/mdt target name lustre-MDTffff 4k blocks 3823108096 options -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F mkfs_cmd = mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F /dev/mpath/mdt 3823108096 cmd: mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F /dev/mpath/mdt 3823108096 mke2fs 1.41.12.2.ora1 (14-Aug-2010) mke2fs: too many inodes (7646216192), raise inode ratio? mkfs.lustre FATAL: Unable to build fs /dev/mpath/mdt (256) mkfs.lustre FATAL: mkfs failed 256

Andreas Dilger added a comment - 13/May/11 6:12 PM

Oleg, this patch should be included into the 2.1 release - it dramatically speeds up mkfs and should fix (for new filesystems) the slow startup problems seen in ~~LU-15~~.

Andreas Dilger added a comment - 13/May/11 6:12 PM Oleg, this patch should be included into the 2.1 release - it dramatically speeds up mkfs and should fix (for new filesystems) the slow startup problems seen in LU-15 .

Andreas Dilger added a comment - 05/May/11 12:46 AM

Jeremy, test RPMs are available via http://review.whamcloud.com/#change,480 if you are able to test them. They are built from the lustre-release repository, so the mkfs.lustre is not directly useful to you if you are testing on 1.8.x.

The default parameters for an OST with this patch (assuming a large-enough LUN size and ext4-based ldiskfs) are:

mke2fs -j -b 4096 -L lustre-OSTffff -J size=400 -I 256 -i 262144 -O extents,uninit_bg,dir_nlink,huge_file,flex_bg -G 256 -E resize=4290772992,lazy_journal_init, -F

{dev}

For an MDT they are:

mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F {dev}

Andreas Dilger added a comment - 05/May/11 12:46 AM Jeremy, test RPMs are available via http://review.whamcloud.com/#change,480 if you are able to test them. They are built from the lustre-release repository, so the mkfs.lustre is not directly useful to you if you are testing on 1.8.x. The default parameters for an OST with this patch (assuming a large-enough LUN size and ext4-based ldiskfs) are: mke2fs -j -b 4096 -L lustre-OSTffff -J size=400 -I 256 -i 262144 -O extents,uninit_bg,dir_nlink,huge_file,flex_bg -G 256 -E resize=4290772992,lazy_journal_init, -F {dev} For an MDT they are: mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F {dev}

Peter Jones added a comment - 02/May/11 7:27 AM

Andreas seems to be working on this

Peter Jones added a comment - 02/May/11 7:27 AM Andreas seems to be working on this

Jeremy Filizetti added a comment - 29/Apr/11 2:46 PM

I'll be running some testing with ~8 TB and larger LUNs over the next few weeks to see the performance impacts of various settings for the groups in a flexible block group, when I have some results I will post them here. My main focus though is to alleviate the slow mounts and issues from ~~LU-15~~. At least mkfs.lustre for a 9 TB LUN drops from 17 minutes to 6 minutes with of >64 for the number of groups.

Jeremy Filizetti added a comment - 29/Apr/11 2:46 PM I'll be running some testing with ~8 TB and larger LUNs over the next few weeks to see the performance impacts of various settings for the groups in a flexible block group, when I have some results I will post them here. My main focus though is to alleviate the slow mounts and issues from LU-15 . At least mkfs.lustre for a 9 TB LUN drops from 17 minutes to 6 minutes with of >64 for the number of groups.

People

Assignee:: Andreas Dilger

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 29/Apr/11 12:36 PM

Updated:: 19/May/11 1:00 AM

Resolved:: 19/May/11 1:00 AM