[LU-6220] push ext4/ldiskfs patches upstream if possible Created: 06/Feb/15 Updated: 13/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Upstream |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Yang Sheng |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 17396 | ||||||||||||||||||||||||||||||||||||
| Description |
|
There are a number of ldiskfs patches that we could potentially push upstream, possibly with some cleanups. All patches should be run through checkpatch.pl first.
|
| Comments |
| Comment by Yang Sheng [ 24/Jul/15 ] |
|
As latest 4.2 kernel status: So just 3 patches need to push upstream. |
| Comment by James A Simmons [ 24/Jul/15 ] |
|
Only 3 more. A lot more patch exist that need to merged upstream to make ldiskfs patchless. |
| Comment by Yang Sheng [ 13/Aug/15 ] |
|
From http://review.whamcloud.com/#/c/15961/ for record Andreas Dilger 4:09 AM Patch Set 5: Code-Review+2 This is much better than before, but not 100% safe if there is another flag used by ext4. It might be useful to try to push a patch upstream that puts BH_BITMAP_DIRTY into an enum with a "last" value to allow us to ensure that BH_DXLock does not conflict. The reason for the upstream patch could be to verify that BH_BITMAP_DIRTY itself does not overflow the b_state field. |
| Comment by Andreas Dilger [ 11/Apr/18 ] |
|
The ext4-kill-dx-root, ext4-large-dir, and ext4-disable-mbcache patches were landed in the upstream 4.12 kernel. The ext4-large-eas patch was landed in the upstream 4.13 kernel. The ext4-data-in-dirent patch is scheduled to land for the upstream 4.17 kernel. In the 4.15 kernel, upstream commit ae5e165d855 and commit ee73f9a52a34 changed how i_version is handled by the VFS and the filesystems. We may be able to leverage this change to remove the ext4-inode-version patch. In particular, if we do not set the SB_I_VERSION flag in the superblock, the VFS will no longer modify the i_version field in the inode, so we may be able to avoid storing the Lustre version in struct ext4_inode.i_fs_version, and can use struct inode.i_version for this (save 8 bytes per ldiskfs inode and an ldiskfs patch). The ext4-nocmtime patch took a bit of a step backward in kernel 4.9, since the ext4_current_time() wrapper that we were using so conveniently was removed and replaced with dozen(s) of direct calls to current_time() from the VFS. The ext4-mmp-brelse patch equivalent landed upstream in kernel 4.5. |
| Comment by James A Simmons [ 13/Apr/18 ] |
|
What is left? Its sounds like we are almost complete for this work. |
| Comment by Andreas Dilger [ 19/Apr/18 ] |
|
The current ldiskfs-4.4-sles12sp3 patch series looks like the following (patches marked '*' are/will be merged soon): sles12sp2/ext4-inode-version.patch sles12sp2/ext4-lookup-dotdot.patch sles12sp2/ext4-print-inum-in-htree-warning.patch sles12sp2/ext4-prealloc.patch sles12sp2/ext4-osd-iop-common.patch sles12sp2/ext4-misc.patch sles12sp3/ext4-mballoc-extra-checks.patch sles12sp2/ext4-hash-indexed-dir-dotdot-update.patch sles12sp2/ext4-kill-dx-root.patch * rhel7/ext4-mballoc-pa-free-mismatch.patch sles12sp2/ext4-data-in-dirent.patch * sles12sp2/ext4-large-eas.patch * sles12sp2/ext4-disable-mb-cache.patch * rhel7/ext4-nocmtime.patch sles12sp2/ext4-large-dir.patch * sles12sp2/ext4-pdirop.patch sles12sp2/ext4-max-dir-size.patch rhel7/ext4-remove-truncate-warning.patch sles12sp3/ext4-corrupted-inode-block-bitmaps-handling-patches.patch * (just under discussion) sles12sp2/ext4-give-warning-with-dir-htree-growing.patch sles12sp2/ext4-mmp-brelse.patch * rhel7/ext4-jcb-optimization.patch sles12sp2/ext4-attach-jinode-in-writepages.patch sles12sp2/ext4-dont-check-before-replay.patch rhel7.2/ext4-dont-check-in-ro.patch sles12sp2/ext4-fix-xattr-shifting-when-expanding-inodes.patch * rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch There may be slightly different patches in the RHEL7 kernel series, but it is against a significantly older kernel so I didn't want to use that as a starting point. This shows there are definitely still a lot of patches to be merged and/or worked around in some other way. At least several of the major on-disk features have been landed, and most of these are only related to in-core changes (API, functionality), so the risk of incompatible on-disk formats is significantly reduced. I'm hoping that e2fsprogs-1.45 might even be able to run e2fsck against an OST or MDT, though I don't think it would be advised without more testing. |
| Comment by Yang Sheng [ 28/Apr/20 ] |
|
Move from https://review.whamcloud.com/38372 as record
|
| Comment by Andreas Dilger [ 11/Jun/20 ] |
|
Patch https://review.whamcloud.com/38372 " |
| Comment by James A Simmons [ 27/Sep/20 ] |
|
Track the parallel VFS + ext4 lookup for upstream: https://patchwork.kernel.org/patch/10695037/ |
| Comment by Andreas Dilger [ 09/Jun/22 ] |
|
The data-in-dirent patch was being discussed for upstream inclusion for 64-but inode support, but eventually that was not accepted because the 64-bit inode patch was only partially tested (ie. it built and didn't break 32-bit inodes, but had code and testing gaps for actual 64-bit inode numbers). Landing the dirdata patch has now become more complex because the "fscrypt+case-insensitive" features are now storing extra information after the name in the dirent. This would have been a perfect case to use dirdata and get it included, except Google had already started using this in Android and didn't want to update the format, so there needs to be additional compatibility added for this case. An agreed-upon proposal to get dirdata accepted without the Lustre kernel dependency is to add a dedicated test ioctl interface to store dirdata in directory entries, and confirm that it is being handled properly (eg. across rename, with e2fsck, etc). For compatibility with Lustre, this could store the "inode|generation" into the Lustre FID slot, since this is equivalent to an IGIF FID, and is also easy to verify later. |