[LU-6220] push ext4/ldiskfs patches upstream if possible Created: 06/Feb/15  Updated: 13/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Upstream

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Yang Sheng
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Duplicate
is duplicated by LU-6141 Merge ldiskfs patches into upstream e... Closed
Related
is related to LU-13054 MDS kernel BUG at ldiskfs/htree_lock.... Resolved
is related to LU-908 multi-block xattr support Resolved
is related to LU-6030 cleanup for ldiskfs patches Resolved
is related to LU-17423 Upstream dir_data feature Open
is related to LU-9724 update ext4-large-eas.patch to match ... Resolved
is related to LU-12511 Prepare lustre for adoption into the ... Open
Rank (Obsolete): 17396

 Description   

There are a number of ldiskfs patches that we could potentially push upstream, possibly with some cleanups. All patches should be run through checkpatch.pl first.

  • ext4-print-inum-in-htree-warning.patch: could be pushed upstream after moving string to the next line so it fits inside 80 columns
  • ext4-disable-mb-cache.patch: I was involved in some discussion upstream about removing the mbcache (http://lwn.net/Articles/564802/), but that patch was never accepted upstream. We could try pushing this patch again, since it is fairly simple, and it wouldn't be very incompatible with the other mbcache speedup patch if that was ever landed.
  • ext4-inode-version.patch: It might be possible to submit a patch upstream that moves dir->i_version++ into ext4_mark_iloc_dirty() like if (IS_I_VERSION(inode) || S_ISDIR(dir)). Then, we can add a mount flag or patch that makes this check a no-op so we only need a very small patch. More importantly, it will reduce the size of ext4_inode_info because we don't need a separate i_fs_version and reduce the memory usage on the MDS.
  • ext4-journal-path-opt.patch: I think this can be pushed upstream without any significant changes
  • ext4-kill-dx-root.patch: could be submitted upstream as a cleanup with some style fixes
  • ext4-max-dir-size.patch: we should print out a deprecation warning in newer kernels to mount with "-o max_dir_size_kb" instead, and then we can eventually remove it? The mount option has existed since kernel 3.6, maybe it would be better to backport support for the mount option to RHEL6 instead of maintaining this patch forever?
  • ext4-large-eas.patch: this is described in LU-908


 Comments   
Comment by Yang Sheng [ 24/Jul/15 ]

As latest 4.2 kernel status:
ext4-print-inum-in-htree-warning.patch: Already merged by commit b03a2f7eb21cc06b541142684abf7eed6aaccf3e
ext4-journal-path-opt.patch: Already merged by commit ad4eec613536dc7e5ea0c6e59849e6edca634d8b

So just 3 patches need to push upstream.
ext4-disable-mb-cache.patch
ext4-kill-dx-root.patch
ext4-large-dir.patch
But they are conflict heavily against latest kernel tree. I'll work out a draft and attached here for review.

Comment by James A Simmons [ 24/Jul/15 ]

Only 3 more. A lot more patch exist that need to merged upstream to make ldiskfs patchless.

Comment by Yang Sheng [ 13/Aug/15 ]

From http://review.whamcloud.com/#/c/15961/ for record

Andreas Dilger 4:09 AM

Patch Set 5: Code-Review+2

This is much better than before, but not 100% safe if there is another flag used by ext4.

It might be useful to try to push a patch upstream that puts BH_BITMAP_DIRTY into an enum with a "last" value to allow us to ensure that BH_DXLock does not conflict. The reason for the upstream patch could be to verify that BH_BITMAP_DIRTY itself does not overflow the b_state field.

Comment by Andreas Dilger [ 11/Apr/18 ]

The ext4-kill-dx-root, ext4-large-dir, and ext4-disable-mbcache patches were landed in the upstream 4.12 kernel.

The ext4-large-eas patch was landed in the upstream 4.13 kernel.

The ext4-data-in-dirent patch is scheduled to land for the upstream 4.17 kernel.

In the 4.15 kernel, upstream commit ae5e165d855 and commit ee73f9a52a34 changed how i_version is handled by the VFS and the filesystems. We may be able to leverage this change to remove the ext4-inode-version patch. In particular, if we do not set the SB_I_VERSION flag in the superblock, the VFS will no longer modify the i_version field in the inode, so we may be able to avoid storing the Lustre version in struct ext4_inode.i_fs_version, and can use struct inode.i_version for this (save 8 bytes per ldiskfs inode and an ldiskfs patch).

The ext4-nocmtime patch took a bit of a step backward in kernel 4.9, since the ext4_current_time() wrapper that we were using so conveniently was removed and replaced with dozen(s) of direct calls to current_time() from the VFS.

The ext4-mmp-brelse patch equivalent landed upstream in kernel 4.5.

Comment by James A Simmons [ 13/Apr/18 ]

What is left? Its sounds like we are almost complete for this work.

Comment by Andreas Dilger [ 19/Apr/18 ]

The current ldiskfs-4.4-sles12sp3 patch series looks like the following (patches marked '*' are/will be merged soon):

sles12sp2/ext4-inode-version.patch
sles12sp2/ext4-lookup-dotdot.patch
sles12sp2/ext4-print-inum-in-htree-warning.patch
sles12sp2/ext4-prealloc.patch
sles12sp2/ext4-osd-iop-common.patch
sles12sp2/ext4-misc.patch
sles12sp3/ext4-mballoc-extra-checks.patch
sles12sp2/ext4-hash-indexed-dir-dotdot-update.patch
sles12sp2/ext4-kill-dx-root.patch *
rhel7/ext4-mballoc-pa-free-mismatch.patch
sles12sp2/ext4-data-in-dirent.patch *
sles12sp2/ext4-large-eas.patch *
sles12sp2/ext4-disable-mb-cache.patch *
rhel7/ext4-nocmtime.patch
sles12sp2/ext4-large-dir.patch *
sles12sp2/ext4-pdirop.patch
sles12sp2/ext4-max-dir-size.patch
rhel7/ext4-remove-truncate-warning.patch
sles12sp3/ext4-corrupted-inode-block-bitmaps-handling-patches.patch * (just under discussion)
sles12sp2/ext4-give-warning-with-dir-htree-growing.patch
sles12sp2/ext4-mmp-brelse.patch *
rhel7/ext4-jcb-optimization.patch
sles12sp2/ext4-attach-jinode-in-writepages.patch
sles12sp2/ext4-dont-check-before-replay.patch
rhel7.2/ext4-dont-check-in-ro.patch
sles12sp2/ext4-fix-xattr-shifting-when-expanding-inodes.patch *
rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch

There may be slightly different patches in the RHEL7 kernel series, but it is against a significantly older kernel so I didn't want to use that as a starting point. This shows there are definitely still a lot of patches to be merged and/or worked around in some other way. At least several of the major on-disk features have been landed, and most of these are only related to in-core changes (API, functionality), so the risk of incompatible on-disk formats is significantly reduced. I'm hoping that e2fsprogs-1.45 might even be able to run e2fsck against an OST or MDT, though I don't think it would be advised without more testing.

Comment by Yang Sheng [ 28/Apr/20 ]

Move from https://review.whamcloud.com/38372 as record

Now you have moved htree locking in a separate patch. I don't think it will be possible to get this merged until there is parallel VFS directory locking in the kernel, but at least it cleans up our own patches.

I think it may be possible to get the ext4-kill-dx-root.patch merged upstream as a code cleanup, which would avoid the need to carry that patch in the Lustre tree forever.

It would be possible to get ext4-data-in-dirent.patch upstream, if we added an ioctl interface for being able to get/set the FID for testing, and similar patches for debugfs to get/set the FID for testing.

Comment by Andreas Dilger [ 11/Jun/20 ]

Patch https://review.whamcloud.com/38372 "LU-13054 ldiskfs: split htree_lock as separate patch" split the ect4-pdirops patch into a separate ext4-htree-lock patch, for a net reduction of 6500 lines of patches.

Comment by James A Simmons [ 27/Sep/20 ]

Track the parallel VFS + ext4 lookup for upstream: https://patchwork.kernel.org/patch/10695037/

Comment by Andreas Dilger [ 09/Jun/22 ]

The data-in-dirent patch was being discussed for upstream inclusion for 64-but inode support, but eventually that was not accepted because the 64-bit inode patch was only partially tested (ie. it built and didn't break 32-bit inodes, but had code and testing gaps for actual 64-bit inode numbers).

Landing the dirdata patch has now become more complex because the "fscrypt+case-insensitive" features are now storing extra information after the name in the dirent. This would have been a perfect case to use dirdata and get it included, except Google had already started using this in Android and didn't want to update the format, so there needs to be additional compatibility added for this case.

An agreed-upon proposal to get dirdata accepted without the Lustre kernel dependency is to add a dedicated test ioctl interface to store dirdata in directory entries, and confirm that it is being handled properly (eg. across rename, with e2fsck, etc). For compatibility with Lustre, this could store the "inode|generation" into the Lustre FID slot, since this is equivalent to an IGIF FID, and is also easy to verify later.

Generated at Sat Feb 10 01:58:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.