[LU-5603] Enable inline_data feature for Lustre Created: 10/Sep/14 Updated: 19/Dec/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major |
| Reporter: | Wang Shilong (Inactive) | Assignee: | Dongyang Li |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | ldiskfs, patch | ||
| Issue Links: |
|
||||||||||||||||||||
| Sub-Tasks: |
|
||||||||||||||||||||
| Rank (Obsolete): | 15672 | ||||||||||||||||||||
| Description |
|
Now, we found ldiskfs(Ext4) directory creation is much slower than file creation. Under RHEL7 with ldiskfs, we got following results.
Directory creation: 52299.122 ops/second As we can see, directory creation performance is much slower than file creation, after doing some profiling, we found cost differences come from following calls:
Directory creation: 111276.454 ops/second As we can see, with inline_data enabled, directory creation performance is same as file creation. We also found inline_data feature not only help directory creation but also help
time cost reduce from 2m25s to 30s, huge differences! this is because inline_data will reduce an extra block allocation which also speed read performance as we don't have to do an extra block IO. In generally, Inline_data could improve performance and reduce space allocation which is also good. As now, inline_data is include in RHEL7 and there seems some conflicts with Lustre dirdata feature, but inline_data deserve us an eye, adding it to lustre will give us some improvement thought we still confirm it under lustre. |
| Comments |
| Comment by Peter Jones [ 10/Sep/14 ] |
|
Hi there Is this an issue that you are planning to work on yourself or are you reporting it in the hope that somebody else will implement your suggestion? Thanks Peter |
| Comment by Shuichi Ihara (Inactive) [ 10/Sep/14 ] |
|
Peter, |
| Comment by Andreas Dilger [ 16/Sep/19 ] |
|
This feature will unfortunaately not help with the IO-500 mdtest-hard-write and mdtest-hard-read, since the data cannot quite fit within the MDT inode even if it was formatted with a 4KB inode size. The mdtest-hard-write parameters create 3901-byte files, along with 160 bytes of the core inode and about 256 bytes of other xattrs are too large to fit even into a 4096-byte inode, if that were in use (which it is not). However, for the default 1024-byte MDT inode size, this would still improve performance for files below approximately 600 bytes in size, or directories with fewer than about 15 average-length (32-byte) filenames. There are definitely some workloads (e.g. OpenFoam, at least in some uses) that may benefit significantly from faster small/empty directory creation speed, and some workloads have a lot of small files. |
| Comment by Andreas Dilger [ 27/May/21 ] |
|
The inline_data feature will also help in the case of agent directory inodes created for remote/striped directories. Instead of allocating a separate directory block to hold only "." and "..", it would be possible to store the ".." entry + parent FID into the directory inode itself. There is no need to store a "." entry, since this is stored in the inode itself and can be generated from inode->i_ino and the self FID stored in trusted.lma. |
| Comment by Stephane Thiell [ 28/Nov/23 ] |
|
As far as I know, the dirdata feature is still not available in ext4 and is currently maintained as an out-of-tree feature for Lustre via an ldiskfs patch named (ext4-data-in-dirent.patch). This is leading to possible incompatibilities when trying to enable newer ext4 features like inline_data with Lustre. I just tried to enable it on EL 9.2 using mkfs.lustre -O inline_data ... and the underlying mke2fs command returns "The dirdata feature can not enabled with inline data feature.". If I try to format with -O ^dirdata,inline_data, the MDT complaints about missing dirdata and then crashes (kernel BUG) in ldiskfs (this is with master from a couple weeks). Is there any solution in sight so Lustre could benefit from inline_data? It seems it would give some performance benefits (maybe not for the IO-500 benchmarks but more likely for real life use cases...) and could also potentially greatly reduce the blocks consumed by small directories. Our own interest forĀ inline_data is not performance-related (although it's always welcome)... but rather in storing a very large amount of very small directories in Lustre so that it can be used more efficiently as a MinIO disk. |
| Comment by Andreas Dilger [ 28/Nov/23 ] |
|
Stephane, yes the incompatibility with upstream is known to me. That is why we haven't enabled this feature yet. It will need some ldiskfs development effort to allow these features to work together. |