[LU-14712] improve ldiskfs "-o discard" performance Created: 26/May/21  Updated: 15/Dec/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Dongyang Li
Resolution: Unresolved Votes: 2
Labels: NVME, ldiskfs, performance

Issue Links:
Related
is related to LU-16750 optimize ldiskfs internal metadata al... Open
Rank (Obsolete): 9223372036854775807

 Description   

The current "-o discard" mount option for ldiskfs enables on-the-fly TRIM of underlying flash devices (or thinly-provisioned LUNs). However, the current implementation hurts performance because it is done synchronously in the context of the JBD2 commit thread as a commit callback, which blocks later transaction commits.

There is work being done in the upstream kernel to improve "-o discard" to use an async worker thread to issue the TRIM commands, and do this on a per-blockgroup basis, rather than issuing the trim on a per-extent basis.

Current patch series is at:
https://marc.info/?l=linux-ext4&m=162201857620045&w=4

There are also patches by Shilong that make the TRIM state for a block group persistent, so that running TRIM with mke2fs does not also lead to fstrim resubmitting TRIM requests for all of the groups again immediately after mount/remount.

Most recent patches at:
https://marc.info/?l=linux-ext4&m=159283169109297&w=4

I think the combination of these two patches would improve ongoing ldiskfs performance on flash devices significantly.



 Comments   
Comment by Andreas Dilger [ 09/May/23 ]

Patches to implement the EXT4_BG_WAS_TRIMMED flag to persistently store the TRIM state for each block group are available at:

e2fsprogs patches:
https://patchwork.ozlabs.org/project/linux-ext4/patch/1590588525-29669-1-git-send-email-wangshilong1991@gmail.com/
https://patchwork.ozlabs.org/project/linux-ext4/patch/1590588525-29669-2-git-send-email-wangshilong1991@gmail.com/

ext4 kernel patch:
https://patchwork.ozlabs.org/project/linux-ext4/patch/1592831677-13945-1-git-send-email-wangshilong1991@gmail.com/
https://patchwork.ozlabs.org/project/linux-ext4/patch/1592835419-7841-1-git-send-email-wangshilong1991@gmail.com/

It seems possible to use the EXT4_BG_WAS_TRIMMED tracking to optimize the fstrim operation, as well as reduce "-o discard" overhead instead of keeping an active list of every allocation extent to trim:

  • keep a list of groups to be trimmed on the superblock
  • at mount time during group descriptor load, add groups to be trimmed to list (no EXT4_BG_WAS_TRIMMED flag)
  • skip TRIM for groups that do not have many large extents (easily seen from mballoc buddy bitmap)
  • add a group to list during runtime when many blocks in the group have been freed
  • keep a simple "active" state for the filesystem (e.g. open journal transaction, read or write in the past N seconds?)
  • start an fstrim_DEVNO thread in the background when "-o discard" (or "-o fstrim") is used
  • wake up fstrim_DEVNO when filesystem is otherwise idle and trim inactive groups on superblock list
Comment by Gerrit Updater [ 11/Aug/23 ]

"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51923
Subject: LU-14712 ldiskfs: introduce EXT4_BG_TRIMMED to optimize fstrim
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c01f5755e88a39429f2f8af16e6ca2912de0c624

Comment by Gerrit Updater [ 11/Aug/23 ]

"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/tools/e2fsprogs/+/51924
Subject: LU-14712 e2fsprogs: support EXT2_FLAG_BG_TRIMMED and EXT2_FLAGS_TRACK_TRIM
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: c352e90a9d02af8a94f4fb8e28a95016da2f2d68

Comment by Gerrit Updater [ 29/Aug/23 ]

"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/tools/e2fsprogs/+/51924/
Subject: LU-14712 e2fsprogs: support EXT2_FLAG_BG_TRIMMED and EXT2_FLAGS_TRACK_TRIM
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 3bf271a76463c3d238f342e39f4f2813e7df6442

Generated at Sat Feb 10 03:12:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.