[LU-9309] Add ldiskfs 64-bit inode number support Created: 10/Apr/17  Updated: 01/May/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Artem Blagodarenko (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11213 DNE3: remote mkdir() in ROOT/ by default Resolved
is related to LU-10784 DNE3: mkdir() automatically create re... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

With current hardware clusters faced with the trouble of creating enough inodes on LDISKFS partitions. MDS has 0-size files to store some information about Lustre FS files. Current MDS disk sizes allow to store large amount of such files, but EXT4 limits this number to ~4 billions.
Lustre FS has features like DNE to distribute MDS over many targets (disks), but disks are used not effectively. It would be great to have ability to store more then ~4 billions inodes on one EXT4 file system.

This topic ("64-bit inode number") recently was discussed in ext4 list. The resume is:

There are two possible solutions:
1. Store higher 32 bit of inode number in ext4 dirent
2. New feature flag which defines the use a 64-bit inode number

Andreas Dilger gave strong reasons to use 1st solution:

The reasons are:

  • this won't use more space for 64-bit inodes than ext4_dir_entry64
  • for 32-bit inode numbers will have smaller dirents
  • significantly more 32-bit dirents can fit into a leaf block (i.e. 10-25%)
  • it is backwards compatible with existing directories and can transparently store 64-bit inode numbers into 32-bit directories without a full update
  • it avoids duplicate code paths for ext4_dir_entry vs ext4_dir_entry64
  • it would be possible to only store high 16 bits (2^48 inodes) since this may be enough for ext4, since ext4_extent can only address 2^48 blocks (2^60 bytes) and there isn't much value to more inodes than blocks?

This issue is about using dirdata to store high bits of 64bit inode number.



 Comments   
Comment by Andreas Dilger [ 10/Apr/17 ]

Note that I'm not against adding such a feature to ext4/ldiskfs, but it is worthwhile to consider potential issues as well, compared to distributing the filesystem metadata across multiple MDTs with DNE:

  • if there is a problem with such a large MDT then there will only be a single-threaded e2fsck running to repair the MDT filesystem, which could take many hours/days to repair, vs. running e2fsck on multiple MDTs in parallel
  • e2fsck on such a large filesystem will require a large amount of RAM to manage the recovery state
  • if the LMA xattr holding the Lustre FID is lost, there is no easy fallback to IGIF FIDs with 64-bit inode numbers
  • having a single large MDT does not allow scaling performance (network, CPU, RAM) as cost-efficiently as multiple smaller MDTs

I agree that the current DNE implementation does not scale metadata load automatically across MDTs/MDS nodes effectively, though this will be improved with DNE2 and striped directories. My thought for enabling DNE to be more "automatic" in its load balancing is to allow automatic directory restriping when a directory grows larger than some number of entries (e.g. 16k), so that users can have the benefit of DNE without having to manually create striped directories.

If you choose to move forward with MDTs with more than 4B inodes, I'd also encourage you to look at making e2fsck multi-threaded and/or event driven so that it can use multiple CPUs and spindles/SSDs effectively, otherwise the check time may become so long that this is not a practical solution even if the on-disk format supports more than 4B inodes.

Comment by Gerrit Updater [ 25/Sep/17 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: https://review.whamcloud.com/29195
Subject: LU-9309 ldiskfs: Add 64-bit inode number support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4b8dedeba7b8a3b2f24259e3b3442d20e6d5fc69

Comment by Gerrit Updater [ 25/Sep/17 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: https://review.whamcloud.com/29196
Subject: LU-9309 debugfs: 64bit inode support
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 8b2120300e4d4739afb7e45ad962a645e77430ba

Comment by Gerrit Updater [ 25/Sep/17 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: https://review.whamcloud.com/29197
Subject: LU-9309 badblocks: bad blocks 64bit inode cleanup
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: d59ff228a04446e22ee0630ff73c868e0e349b7d

Comment by Gerrit Updater [ 25/Sep/17 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: https://review.whamcloud.com/29198
Subject: LU-9309 ext2fs: add EXT4_FEATURE_INCOMPAT_64INODE suport
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: c6f3dd0b051cebf5ca6d5d4ca6af06a323fd8506

Comment by Gerrit Updater [ 25/Sep/17 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: https://review.whamcloud.com/29199
Subject: LU-9309 quota: swaping s_prj_quota_inum superblock field
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 011a538a852f5948bb383b8c82892688f9d78d72

Comment by Gerrit Updater [ 25/Sep/17 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: https://review.whamcloud.com/29200
Subject: LU-9309 quota: quota 64bit inode number cleanup
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: f1a786e9c8da9d75760cbf43005f62db31bac3d5

Comment by Andreas Dilger [ 11/Jun/20 ]

Link to changes improving DNE usage distribution. More work is still needed to get DNE balance as good as OST space balance.

Generated at Sat Feb 10 02:25:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.