Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-255

use ext4 features by default for newly formatted filesystems

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.1.0
    • Lustre 2.1.0
    • None
    • 5035

    Description

      There are a number of ext4 features that we should be enabling by default for newly-formatted ldiskfs filesystems. In particular, the flex_bg option is important for reducing e2fsck time as well as avoiding "slow first write" issues that have hit a number of customers with fuller OSTs. Using flex_bg would avoid 10-minute delay at mount time or for each e2fsck run. As well, it would be useful to also enable other features like huge_file (files > 2TB) and dir_nlink (> 65000 subdirectories) by default.

      All of these features are enabled by default if we format the filesystem with the option "-t ext4". Alternately, we could enable these individually in enable_default_backfs_features().

      See http://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_fujita.pdf for a summary of improvements. While we won't see the 12h e2fsck -> 5 minute e2fsck improvement shown there (we already use extents and uninit_bg), the flex_bg feature is definitely still a win.

      Attachments

        Activity

          [LU-255] use ext4 features by default for newly formatted filesystems

          Formatting MDT also worked, when I added --mkfsoptions="-i 4096" to mkfs.lustre...

          ihara Shuichi Ihara (Inactive) added a comment - Formatting MDT also worked, when I added --mkfsoptions="-i 4096" to mkfs.lustre...

          I'm also interested in this patches and just tested patched RPMs. When I formatted the MDT (16TB), it failed due to the following errors. Any advises? OST format worked well.

          1. mkfs.lustre --verbose --reformat --mgs --mdt /dev/mpath/mdt

          Permanent disk data:
          Target: lustre-MDTffff
          Index: unassigned
          Lustre FS: lustre
          Mount type: ldiskfs
          Flags: 0x75
          (MDT MGS needs_index first_time update )
          Persistent mount opts: user_xattr,errors=remount-ro
          Parameters:

          device size = 14934016MB
          formatting backing filesystem ldiskfs on /dev/mpath/mdt
          target name lustre-MDTffff
          4k blocks 3823108096
          options -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F
          mkfs_cmd = mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F /dev/mpath/mdt 3823108096
          cmd: mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F /dev/mpath/mdt 3823108096
          mke2fs 1.41.12.2.ora1 (14-Aug-2010)
          mke2fs: too many inodes (7646216192), raise inode ratio?

          mkfs.lustre FATAL: Unable to build fs /dev/mpath/mdt (256)

          mkfs.lustre FATAL: mkfs failed 256

          ihara Shuichi Ihara (Inactive) added a comment - I'm also interested in this patches and just tested patched RPMs. When I formatted the MDT (16TB), it failed due to the following errors. Any advises? OST format worked well. mkfs.lustre --verbose --reformat --mgs --mdt /dev/mpath/mdt Permanent disk data: Target: lustre-MDTffff Index: unassigned Lustre FS: lustre Mount type: ldiskfs Flags: 0x75 (MDT MGS needs_index first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: device size = 14934016MB formatting backing filesystem ldiskfs on /dev/mpath/mdt target name lustre-MDTffff 4k blocks 3823108096 options -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F mkfs_cmd = mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F /dev/mpath/mdt 3823108096 cmd: mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F /dev/mpath/mdt 3823108096 mke2fs 1.41.12.2.ora1 (14-Aug-2010) mke2fs: too many inodes (7646216192), raise inode ratio? mkfs.lustre FATAL: Unable to build fs /dev/mpath/mdt (256) mkfs.lustre FATAL: mkfs failed 256

          Oleg, this patch should be included into the 2.1 release - it dramatically speeds up mkfs and should fix (for new filesystems) the slow startup problems seen in LU-15.

          adilger Andreas Dilger added a comment - Oleg, this patch should be included into the 2.1 release - it dramatically speeds up mkfs and should fix (for new filesystems) the slow startup problems seen in LU-15 .

          Jeremy, test RPMs are available via http://review.whamcloud.com/#change,480 if you are able to test them. They are built from the lustre-release repository, so the mkfs.lustre is not directly useful to you if you are testing on 1.8.x.

          The default parameters for an OST with this patch (assuming a large-enough LUN size and ext4-based ldiskfs) are:

          mke2fs -j -b 4096 -L lustre-OSTffff -J size=400 -I 256 -i 262144 -O extents,uninit_bg,dir_nlink,huge_file,flex_bg -G 256 -E resize=4290772992,lazy_journal_init, -F

          {dev}

          For an MDT they are:

          mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F {dev}
          adilger Andreas Dilger added a comment - Jeremy, test RPMs are available via http://review.whamcloud.com/#change,480 if you are able to test them. They are built from the lustre-release repository, so the mkfs.lustre is not directly useful to you if you are testing on 1.8.x. The default parameters for an OST with this patch (assuming a large-enough LUN size and ext4-based ldiskfs) are: mke2fs -j -b 4096 -L lustre-OSTffff -J size=400 -I 256 -i 262144 -O extents,uninit_bg,dir_nlink,huge_file,flex_bg -G 256 -E resize=4290772992,lazy_journal_init, -F {dev} For an MDT they are: mke2fs -j -b 4096 -L lustre-MDTffff -J size=400 -I 512 -i 2048 -O dirdata,uninit_bg,dir_nlink,huge_file,flex_bg -E lazy_journal_init, -F {dev}
          pjones Peter Jones added a comment -

          Andreas seems to be working on this

          pjones Peter Jones added a comment - Andreas seems to be working on this

          I'll be running some testing with ~8 TB and larger LUNs over the next few weeks to see the performance impacts of various settings for the groups in a flexible block group, when I have some results I will post them here. My main focus though is to alleviate the slow mounts and issues from LU-15. At least mkfs.lustre for a 9 TB LUN drops from 17 minutes to 6 minutes with of >64 for the number of groups.

          jfilizetti Jeremy Filizetti added a comment - I'll be running some testing with ~8 TB and larger LUNs over the next few weeks to see the performance impacts of various settings for the groups in a flexible block group, when I have some results I will post them here. My main focus though is to alleviate the slow mounts and issues from LU-15 . At least mkfs.lustre for a 9 TB LUN drops from 17 minutes to 6 minutes with of >64 for the number of groups.

          People

            adilger Andreas Dilger
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: