Details
-
Improvement
-
Resolution: Fixed
-
Major
-
Lustre 2.1.0
-
None
-
5035
Description
There are a number of ext4 features that we should be enabling by default for newly-formatted ldiskfs filesystems. In particular, the flex_bg option is important for reducing e2fsck time as well as avoiding "slow first write" issues that have hit a number of customers with fuller OSTs. Using flex_bg would avoid 10-minute delay at mount time or for each e2fsck run. As well, it would be useful to also enable other features like huge_file (files > 2TB) and dir_nlink (> 65000 subdirectories) by default.
All of these features are enabled by default if we format the filesystem with the option "-t ext4". Alternately, we could enable these individually in enable_default_backfs_features().
See http://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_fujita.pdf for a summary of improvements. While we won't see the 12h e2fsck -> 5 minute e2fsck improvement shown there (we already use extents and uninit_bg), the flex_bg feature is definitely still a win.
Realistically, it is very unlikely to re-use anything from the internal journal in this case. The journal superblock will be rewritten, with a new journal transaction ID of 1, and marking no oustanding transactions to recover, and when it is mounted the TID will increment from 1.
If the node crashed before it had overwritten the journal (unlikely even under relatively low usage) there would still need to be transactions left in the journal that aligned right after the end of the current transaction, and also with the next TID in sequence.
In practice I think the chance of this is very low except in test filesystems that are reformatted repeatedly after a very short lifespan, but if you want I could drop this part of the patch. It avoids 400MB of IO to the device at mke2fs time, but even then this is a small portion of the inode table blocks being written.