Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
15
-
7207
Description
In LU-255 mkfs.lustre was changed to begin creating new OST and MDT filesystems with features only available in ext4-based ldiskfs. As well, the default parameters for the number of inodes created on both MDT and OST filesystems was modified from their early, very conservative, values to ones that more accurately reflect the usage patterns of Lustre today.
In particular:
- new features enabled by default, if installed e2fsprogs supports them:
- flex_bg - aggregates bitmaps and inode tables for multiple groups
together in order to avoid seeking when reading/writing
the bitmaps, and reduce read/modify/write on RAID storage.
This is enabled on both OST and MDT filesystems. On MDT
filesystems the flex_bg factor (the number of groups'
metadata co-located on disk) is left at the default 16,
while on OSTs the flex_bg factor is set to 256, to allow
all of the block or inode bitmaps in a single flex_bg to
be read or written in a single IO on typical RAID storage. - huge_file-allow files on OSTs to be larger than 2TB in size. This
is still dependent on support from Lustre in order to use
objects larger than 2TB
- changes to the default number of inodes created on the filesystems
- on MDTs the number of inodes created for a given size of filesystem
is doubled compared to previous versions of Lustre. There is now one
inode created for each 2kB of LUN on which the filesystem is created,
unless the amount of space needed for the filesystem default striping
(as specified by the "--stripe_count_hint=N" option) is larger. - on OSTs the number of inodes created for larger LUNs is increased.
As with previous versions of mkfs.lustre, it is possible to override
the default inode ratio passing "-i <ratio>" option to --mkfsoptions.
The inode ratio should be consideredLUN size inode ratio total inodes < 10GiB 1 inode/16kiB 640-655k 10GiB-1TiB 1 inode/68kiB 153k-15.7M 1TiB-8TiB 1 inode/256kiB 4.2M-33.6M > 8TiB 1 inode/1MiB 8.4M-134M
- reduction in the time it takes to format a filesystem
- enable the "lazy_journal_init" feature by default, to avoid a full
overwrite of the 400MB journal that Lustre allocates by default. - on devices that support the SCSI UNMAP or ATA TRIM command and also
return zeros on read of UNMAP/TRIM regions, the underlying device is
completely erased at format time to avoid the need to zero out the
blocks used by the journal and inode table, reducing format time
significantly