[LU-16305] mkfs.lustre fails on devices between 16TiB-32GiB and 16TiB-1B Created: 09/Nov/22 Updated: 13/Nov/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Ellis Wilson | Assignee: | Ellis Wilson |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Epic/Theme: | ldiskfs |
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Attempting to mkfs.lustre a disk below 16TiB in size but above the value computed for "resize" by mkfs.lustre (for 4KiB blocks, 32GiB shy of 16TiB) results in a failure as mke2fs requires resize to be greater than the specified capacity. Example: |
| Comments |
| Comment by Andreas Dilger [ 09/Nov/22 ] |
|
Ellis, thanks for filing the ticket. Until this is fixed in the code, it should be possible to work around the issue by adding ",^resize_inode" to the "-O" feature list, and removing "resize=4290772992," from the "-E" extended options list on the mkfs.lustre command line. |
| Comment by Ellis Wilson [ 09/Nov/22 ] |
|
No problem! I fixed it with the following patch (going through the mechanics presently to test/submit the patch):
--- a/lustre/utils/libmount_utils_ldiskfs.c
+++ b/lustre/utils/libmount_utils_ldiskfs.c
@@ -885,6 +885,15 @@ int ldiskfs_make_lustre(struct mkfs_opts *mop)
append_unique(start, ext_opts ? "," : " -E ",
"resize", buf, maxbuflen);
ext_opts = 1;
+
+ /* The resize maximum must be greater than filesystem size, but for disks
+ * or arrays just shy of 16TiB you can get into a situation where capacity
+ * is between resize_blks and 16TiB. Shrink the drive size to 1MiB less
+ * than resize in these scenarios (at most ~0.1% capacity is lost).
+ */
+ if (resize_blks <= mop->mo_device_kb / mop->mo_blocksize_kb) {
+ mop->mo_device_kb = (long long)(resize_blks) * (long long)mop->mo_blocksize_kb - 1024;
+ }
}
/* Avoid zeroing out the full journal - speeds up mkfs */
{{
As posted on lustre-discuss, I have some questions about the intent behind resize, and IDK how to get this assigned to me (maybe that's only for WC people). |
| Comment by Andreas Dilger [ 10/Nov/22 ] |
|
Rather than shrink the MDT device, it would be better to just disable the resize_inode feature for such filesystems, since it is not useful for filesystems over 16TiB anyway. |
| Comment by Ellis Wilson [ 10/Nov/22 ] |
|
I believe this only applies to OSTs, and while I can disable it I'd like to better understand what the optimization is attempting to accomplish first. I think you put this block in around 2011 (could totally be wrong – it's moved around a few times). Do you remember what it was accomplishing? I'm really struggling to understand this comment block: Is ext metadata really unaligned without specifying resize? Some docs suggest that without giving this, mke2fs plans for up to 1024 times the original size of the filesystem, so I don't feel like this is a case where we're trying to plan ahead more than mke2fs already does. |
| Comment by Andreas Dilger [ 10/Nov/22 ] |
|
The resize_inode feature only works up to 16TB, so it is basically useless for the problematic filesystem and may as well be disabled for such filesystems. There is a different feature (meta_bg) that is used for resizing filesystems beyond 16TB. The 1024x resize is based on a starting filesystem size that is much smaller. Yes, the ext4 metadata is not aligned to 1MB boundaries by default, and this option (along with some others added in the same patch) ensures that other metadata was located with proper 1MB alignment for HDD RAID alignment. That is not so important for flash MDTs at this point either. So my approach to fixing this issue would be to disable the resize_inode feature (if this isn't done automatically already) and not specify the "-E resize=nnnn" option for filesystems that are close to 16TB in size. |
| Comment by Ellis Wilson [ 10/Nov/22 ] |
|
Thanks for the clarification Andreas. I've revised my in-house fix, and will run it through the steps on your submitting changes wiki shortly. |
| Comment by Peter Jones [ 13/Nov/22 ] |
|
elliswilson I have added you to the developers group for the community project so you should now be able to do things like assign tickets to yourself etc |