[LU-5603] Enable inline_data feature for Lustre Created: 10/Sep/14  Updated: 19/Dec/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Wang Shilong (Inactive) Assignee: Dongyang Li
Resolution: Unresolved Votes: 0
Labels: ldiskfs, patch

Issue Links:
Related
is related to LU-11589 kernel BUG at ldiskfs.h:1907! Open
is related to LU-9627 Bad small-file behaviour even when lo... Open
is related to LU-16355 batch dirty buffered write of small f... Open
is related to LU-16921 CSDC: ensure that DoM and CSDC work t... Closed
Sub-Tasks:
Key
Summary
Type
Status
Assignee
LU-11589 kernel BUG at ldiskfs.h:1907! Technical task Open WC Triage  
Rank (Obsolete): 15672

 Description   

Now, we found ldiskfs(Ext4) directory creation is much slower than file creation.

Under RHEL7 with ldiskfs, we got following results.

  1. ./mdtest -d /mnt/test/ -n 100000 -i 3

Directory creation: 52299.122 ops/second
File creation: 106569 ops/second

As we can see, directory creation performance is much slower than file creation, after doing some profiling, we found cost differences come from following calls:

->ext4_mkdir()
->ext4_init_new_dir()
->ext4_append()
->ext4_bread()
->ext4_getblk()
->ext4_map_blocks()
So here an extra block allocation for '.' and '..' items will cost extra time, and we enable inline_data which will reduce block allocation for such case, things got better:

Directory creation: 111276.454 ops/second
File creation: 114920.338 ops/second

As we can see, with inline_data enabled, directory creation performance is same as file creation.

We also found inline_data feature not only help directory creation but also help
directory reading, consider following case:

  1. creating 100W directories under test directory and then run 'time ls -R ./test'

time cost reduce from 2m25s to 30s, huge differences! this is because inline_data will reduce an extra block allocation which also speed read performance as we don't have to do an extra block IO.

In generally, Inline_data could improve performance and reduce space allocation which is also good.

As now, inline_data is include in RHEL7 and there seems some conflicts with Lustre dirdata feature, but inline_data deserve us an eye, adding it to lustre will give us some improvement thought we still confirm it under lustre.



 Comments   
Comment by Peter Jones [ 10/Sep/14 ]

Hi there

Is this an issue that you are planning to work on yourself or are you reporting it in the hope that somebody else will implement your suggestion?

Thanks

Peter

Comment by Shuichi Ihara (Inactive) [ 10/Sep/14 ]

Peter,
We would like to work on this. There is a metadata performance (directory creation and removal) limit. In order to confirm, we did some fundamental testing and confirmed "inline_data" siginicant improves the performance of directory metadata operations with ext4. And, it (or simiar way) is one of good candidates to break through today's lustre direcotry operation's performance.
Howerver, still some discussions are required to move forward since there is no compatibility of "inline_data" and "dirdata".
First of all, we would like to test on Lustre with "inline_data" or keeping "dirdata" + similar way of "inline_data" is doing, then make sure this idea could work with Lustre as well.

Comment by Andreas Dilger [ 16/Sep/19 ]

This feature will unfortunaately not help with the IO-500 mdtest-hard-write and mdtest-hard-read, since the data cannot quite fit within the MDT inode even if it was formatted with a 4KB inode size. The mdtest-hard-write parameters create 3901-byte files, along with 160 bytes of the core inode and about 256 bytes of other xattrs are too large to fit even into a 4096-byte inode, if that were in use (which it is not).

However, for the default 1024-byte MDT inode size, this would still improve performance for files below approximately 600 bytes in size, or directories with fewer than about 15 average-length (32-byte) filenames. There are definitely some workloads (e.g. OpenFoam, at least in some uses) that may benefit significantly from faster small/empty directory creation speed, and some workloads have a lot of small files.

Comment by Andreas Dilger [ 27/May/21 ]

The inline_data feature will also help in the case of agent directory inodes created for remote/striped directories. Instead of allocating a separate directory block to hold only "." and "..", it would be possible to store the ".." entry + parent FID into the directory inode itself. There is no need to store a "." entry, since this is stored in the inode itself and can be generated from inode->i_ino and the self FID stored in trusted.lma.

Comment by Stephane Thiell [ 28/Nov/23 ]

As far as I know, the dirdata feature is still not available in ext4 and is currently maintained as an out-of-tree feature for Lustre via an ldiskfs patch named (ext4-data-in-dirent.patch). This is leading to possible incompatibilities when trying to enable newer ext4 features like inline_data with Lustre. I just tried to enable it on EL 9.2 using mkfs.lustre -O inline_data ... and the underlying mke2fs command returns "The dirdata feature can not enabled with inline data feature.". If I try to format with -O ^dirdata,inline_data, the MDT complaints about missing dirdata and then crashes (kernel BUG) in ldiskfs (this is with master from a couple weeks).

Is there any solution in sight so Lustre could benefit from inline_data? It seems it would give some performance benefits (maybe not for the IO-500 benchmarks but more likely for real life use cases...) and could also potentially greatly reduce the blocks consumed by small directories. Our own interest forĀ inline_data is not performance-related (although it's always welcome)... but rather in storing a very large amount of very small directories in Lustre so that it can be used more efficiently as a MinIO disk.

Comment by Andreas Dilger [ 28/Nov/23 ]

Stephane, yes the incompatibility with upstream is known to me. That is why we haven't enabled this feature yet. It will need some ldiskfs development effort to allow these features to work together.

Generated at Sat Feb 10 01:52:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.